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ABSTRACT 


Digital  encoding  of  speech  to  allow  more  efficient 
transmission  at  low  data  rates  involves  the  decomposition 
of  the  speech  waveform  Into  various  parameters  which  are 
related  to  the  physical  structure  of  the  speech  production 
process.  In  this  thesis,  linear  predictive  coding  is  used 
to  produce  a set  of  coefficients  for  the  characteri  stic 
polynomial  of  sucessive  25  msec,  segments  of  the  voice 
track,  in  the  z-domain.  The  location  of  the  poles  in  the 
z-plane  and  the  excitation  pitch  period  are  then  shifted 
and  the  signal  reformulated  to  cause  changes  of  the  overall 
frequency  characteristics  of  the  speech  waveform,  while 
maintaining  the  perceived  sounds  and  information  content. 
The  resulting  audio  tapes  confirm  the  theory  and 
conjectures  of  the  thesis. 
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I.  INTRODUCTION 


Digital  processing  of  speech  signals  has  become 
important  and  necessary  with  the  introduction  of  high-speed 
digital  devices  into  every  phase  of  communication:  place  to 
place;  nan  to  machine;  and  machine  to  man. 

Digital  signals  have  a number  of  inherent  advantages 
over  analog  signals.  Digital  signals  may  be  coded  for 
security  or  for  noise  immunity.  A digital  voice  signal  may 
be  transmitted  by  the  same  eauipment  used  for  data  and  it 
may  be  multiplexed  with  that  data.  One  of  the  primary 
d i sadvan tages  of  the  digital  transmission  of  voice  is  the 
large  bandwidth  required  with  some  digital  techniques. 

When  analog  techniques,  such  as  single  side-band  amplitude 
modulation,  produce  bandwidths  of  5KHz  and  the  best  digital 
system  bandwidth  was  6i*khz,  there  was  a very  strong 
tendency  to  stay  with  the  analog  techniques. 

However,  recent  advances  in  digital  signal  processing 
have  made  the  digital  transmission  of  voice  highly 
efficient.  Unt*l  recently  digital  transmission  of  speech 

was  possible  only  by  sampling  the  voice  waveform  at  a u 

f. 

sufficiently  high  rate  and  then  performing  an 

'I 

ana  1 og-to-d i g i ta 1 conversion  of  each  sample.  A sufficient 
number  of  bits  were  transmitted  for  each  sample  which  was 
sent  to  reconstruct  the  waveform  at  the  reciever.  The 
voice  waveform  must  be  sampled  at  aprox i^a te 1 y 8,000 
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samples  per  second  to  avoid  the  loss  of  clarity.  E^ch  of 
the  samples  must  then  be  converted  to  a 6-10  bit  number  for 
transmission.  The  overall  data  rate  using  these  methods  had 
a lower  limit  In  the  neighborhood  of  48,000  bits  per 
second . 

Recent  developments  have  allowed  the  voice  pa’ttern  to 
be  broken  down  Into  more  basic  parameters  which  are  closely 
associated  with  the  physical  production  of  speech.  These 
parameters  vary  rather  slowly  and  can  be  transmitted  at  a 
lower  rate.  Data  rates  as  low  as  1200  bits  per  second  have 
been  achieved  through  the  use  of  these  techniques. 

These  methods  are  numerical  represen ta t i ons  of  the 
physical  production  of  speech,  and  therefore  it  is  easier 
to  alter  the  characteristics  of  speech  by  altering  the 
associated  parameters  then  by  trying  to  alter  the  waveform 
di recti y . 

This  thesis  reviews  various  digital  speech  processing 
techniques  for  use  in  a speech  modification  syster.  Linear 
predictive  coding  (LPC)  was  chosen  for  implementation  and 
therefore  the  theory  and  practice  of  this  techniaue  are 
explained  in  detail.  The  desired  modification  of  the 
speech  waveform  by  shifting  the  poles  of  its  cha rac te r i s t i c 
polynomial,  and  the  regeneration  of  the  altered  waveform 
are  discussed  and  the  implementation  techniques  expla.red. 
The  IBM  360  computer  was  used  for  simulating  the  techniciues 
developed.  This  simulation  is  covered  in  detail  and  the 
computer  programs,  with  results,  are  provided. 
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II.  SPEECH  PRODUCTION  AND  CHARACTER  I ST  I CS 


Any  digital  system  for  altering  speech  characteristics 
must  be  based  on  knowledge  of  those  character st  ics  anrl  the 
physical  structure  which  determines  then. 

A.  SPEECH  CHARACTERISTICS 

All  speech  can  be  broken  down  into  a set  of  distinctive 
sounds  called  phonemes.  In  the  case  of  American  English, 
there  are  generally  considered  to  be  42  distinct  phonemes 
which  are  classified  into  vowels,  diphthongs,  semivcwels 
and  consonants.  Spoken  communication  is  accomplished 
through  various  combinations  of  these  sounds  and  the 
accurate  reproduction  of  each  is  a major  criteria  in 
judging  voice  processing  systems.  Phonemes  are  generated 
at  a rate  of  about  ten  per  second.  Each  phoneme  is 
classified  as  voiced  if  vocal  cord  vibration  is  the  source 
of  the  sound  or  unvoiced  if  the  sound  is  produced  by  other 
means.  If  the  character i s t i cs  of  a phoneme  change  from  the 
start  to  finish,  the  phoneme  is  called  noncon t i nuar t . Those 
phonemes  which  are  stationary  are  called  continuant. 

The  lowest  frequency  present  in  a given  voiced  sound  is 
called  the  pitch  frequency.  There  are  peaks  in  the  spectral 
representat ion  of  a speech  sound  that  are  above  the  pitch 
frequency  which  are  called  formants  and  are  numbered 
consecutively  with  increasing  frequency.  Although  two 


speakers  may  produce  the  same  phoneme,  the  pitch  and 
formant  frequencies  may  be  different.  However,  general 
relationships  may  be  established  between  pitch  and  formant 
frequencies  which  are  relatively  constant  from  speaker  to 
speaker,  producing  the  same  phoneme.  If  information  is  to 
be  retained  by  a speech  processing  system,  it  must  be  able 
to  reproduce  at  output,  the  pitch  and  formant  frequency 
relationship  which  was  present  at  the  input. 

B.  PHYSICAL  SPEECH  PRODUCTION  STRUCTURE 

The  vocal  tract  is  a resonant  tube  wi th  the  vocal  cords 
at  one  end  and  the  lips  at  the  other.  The  vocal  tract  acts 
as  a frequency  selective  filter  which  has  a transfer 
function  that  depends  on  how  it  is  shaped  at  any  given 
time. 


(A)  VOICED  (B)  UNVOICED 

FIGURE  1.  SOUND  PRODUCTION 

The  input  to  the  vocal  tract  is  caused  by  either  the 
vibration  of  the  vocal  cords  at  the  lower  end  (figure  l.a) 
or  by  the  turbulence  of  air  being  forced  through  a 
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constriction  at  any  of  a number  of  locations  along  the 
vocal  tract  (figure  l.b).  The  vocal  tract  acts  as  a filter 
with  a pulsed  input  from  the  vocal  cords  when  producing 
voiced  sounds  such  as  'a'  or  'o'.  During  sounds  caused  by 
the  forcing  of  air  through  a constriction,  fricative  sounds 
like  's'  or  'f',  the  vocal  tract  acts  as  a resonant  cavity 
which  will  have  certain  characteristic  response 
frequencies.  Typical  waveforms  for  voiced  and  unvoiced 
sounds  are  shown  in  figure  2. 

VOICED 

UNVOICED 

FIGURE  2.  TYPICAL  WAVEFORMS 

Certain  character i s t i cs  of  the  vocal  tract  are  changed 
several  times  per  second  to  produce  different  sounds  while 
others  such  as  overall  length  and  the  diameter  range  limits 
are  fixed  for  a given  speaker.  A detailed  look  at  each  of 
the  types  of  sounds  will  insure  that  the  digital  processor 
used  has  the  same  flexibility  as  the  actual  speaker. 

Vowels,  voiced  continuant  sounds,  are  produced  when 
the  vocal  cords  vibrate  causing  pulses  of  air  at  the  bottom 

I 
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of  the  vocal  tract.  The  shape  of  the  vocal  tract  remains 
fixed  during  vowel  production,  acting  as  a stationary 
filter  to  respond  to  the  forcing  function. 

The  production  of  diphthongs  and  semivowels  Is  similar 
to  that  of  vowels  except  that  the  shape  of  the  vocal  tract 
Is  smoothly  changed  during  voicing,  diphthongs  and 
semivowels  are  noncontinuant,  voiced  sounds. 

The  phonemes  classified  as  consonants  may  actually  be 
further  divided  Into  subca tago r i es  of  voiced  fricatives, 
unvoiced  fricatives,  stops  and  nasals.  Fricatives  are 
caused  by  the  steady  flow  of  air  through  a constriction  in 
the  vocal  tract  which  causes  turbulant  air  motion  and  a 
seemingly  random  air  pressure  pattern.  Fricatives  are 
voiced  or  unvoiced  depending  on  whether  the  vocal  cords  are 
producing  pressure  pulses  at  the  same  time.  Stops  or 


plosives  are  caused  by  completely  closing  the  vocal  tract 
and  then  suddenly  opening  it  to  quickly  start  sound 
production.  A stop  is  classified  as  voiced  or  unvoiced 
depending  on  the  nature  of  the  sound  that  follows  the 
opening  of  the  vocal  tract.  Nasals  are  voiced  sounds  which 
are  formed  when  the  vocal  tract  is  closed  and  air  is 
allowed  to  pass  through  the  nasal  cavity.  This  acts  as  a 
feed  forward  path  for  the  sound  and  a corresponding  change 
is  caused  in  the  total  vocal  tract  response. 

C.  INFORMATION  CONTENT 


One  of  the  primary  goals  of  speech  processing  is  the 


development  of  efficient  codes  for  transmitting  or  storing 
speech  and  still  allowing  it  to  be  recons t rue  ted  without 
excessive  loss  of  information.  The  source  coding  theorem 
states  that  through  the  proper  choice  of  coding  we  can  code 
a source  into  a bit  sequence  arbitrarily  close  in  length  to 
the  entropy  of  that  source.  However,  efficient  codes  are 
difficult  to  find  for  even  simple  binary  sources,  let  alone 
a continuous  speech  source.  An  estimation  of  the  entropy  of 
a typical  speech  source  provides  a useful  guage  for 
measuring  the  data  rate  performance  of  any  system. 

If  'excessive  loss  of  information'  occurs  only  when  we 
don't  receive  the  correct  one  of  the  42  phonemes,  the 
information  content  of  one  second  of  speech  is 
approx imate 1 y (assuming  10  phonemes  are  produced  per 
second ) : 

42 

"--I  P(p. ) (-log  P(p. )) 
i -1 

where  P(p.)  is  the  probability  of  the  i th  phoneme.  Assuming 
further  that  each  phoneme  is  equally  likely, 

H ■ 10  x 42  x 1/42  x log  42  * 54  bits  per  second 

if  the  actual  probability  of  each  phoneme  was  used,  i.e. 
they  are  not  equally  likely,  the  value  of  entropy  would  be 
s i gn i f i cant  1 y 1 ewer . 

If  'excessive  loss  of  information'  also  includes 
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failure  to  identify  the  speaker  and  failure  to  indicate  the 
speaker's  emotional  state  the  information  content  is 
higher.  However  if  we  assume  that  identification  of  the 
speaker  (one  of  about  two  billion)  is  only  reauired  once 
per  minute  and  that  the  speaker's  emotional  state  (say  one 
of  ten)  can  only  change  once  per  second  the  entropy  is 
still  only  58  bits. 

H(speaker)  * 1/60  x 10  x 1/10  x ( - 1 og (1/10  ))  = 0.5 
j H(enotion)  - 10  x 1/10  x (-log  (1/10))  - 3.3 

H(phoneme)  3 54  bits  per  second 

H(total)  = 58  bits  per  second 

Clearly  the  theoretical  limit  is  not  being  pushed  by  the 
current  state  of  the  art  in  speech  coding. 
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III.  DIGITAL  SPEECH  PROCESSING  TECHMIOUES 


Digital  speech  processing  techniaues  may  be  placed  into 
three  general  categories  based  on  the  assumptions  used  in 
their  development.  The  first  category  is  that  of  waveform 
techniques  where  the  only  primary  assumption  Is  that  the 
signal  which  is  being  processed  is  frequency  limited  to  no 
more  than  half  of  the  sampling  freauency.  The  second 
category  of  spectral  methods  adds  the  assumption  that  the 
frequency  domain  cha rac te r i s t i c s of  the  speech  waveform 
vary  slowly.  Finally,  the  voice  tract  parameter  techniques 
assume  that  the  physical  voice  production  system  can  be 

~ode 1 ed  di g i ta 1 1 y . j 

A.  WAVEFORM  METHODS 

Vlaveforn  techniques  have  the  cha  rac  te  r i s t i c of 
operating  equally  well  on  any  low-pass  filtered  waveform 
and  all  are  generally  based  on  the  farrilar  pulse  code 
modulation.  The  basic  requirements  of  a waveform 
quantization  method  is  that  the  waveform  be  sampled  at 
greater  than  twice  the  highest  frequency  present  and  that 
the  samples  be  quantized  into  a digital  code  for  ! 

transmission.  Although  this  technique  is  very  straight 
forward,  it  also  reauires  a high  data  rate.  A waveform 

I 

sampled  9600  times  per  second  with  each  sample  quantized  to 
256  levels  would  require  76,800  bits  per  second  for 
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transmission.  A number  of  variations  (differential 
modulation  and  adaptive  differential  modulation)  have  been 
used  to  reduce  the  required  data  rate  but  have  failed  to 
cut  the  required  data  rate  by  more  than  about  half. 

B.  SPECTRAL  TECHNIQUES 

1.  Short  Term  Frequency  Analysis 

These  methods  deal  with  the  short-term  freauency 
properties  of  the  speech  signal.  An  early  spectral  method 
was  the  channel  vocoder.  The  transmitting  processor  of  the 
channel  vocoder  consists  of  a bank  of  narrow-band  analog 
filters.  The  energy  passed  by  each  filter  is  measured  and 
transmitted  to  the  receiver  site.  It  is  also  determined 
whether  the  input  speech  was  voiced  or  unvoiced  and  that 
determination  is  transmitted.  In  the  receiver  an 
excitation  signal/  determined  by  the  voicing  decision,  was 
fed  into  a bank  of  narrow-band  filters,  each  of  which  had 
an  adjustable  gain  determined  by  the  received  energy 
measurements . 

The  same  technique  can  be  implemented  in  an  all 
digital  method  by  replacing  the  bank  of  analog  filters  with 
digital  filters  or  by  performing  a discrete  Fourier 
transformation  (DFT)  on  a frame  of  input  samples.  The  use 
of  the  DFT  is  usually  preferred  because  of  computational 
efficiency  and  the  availability  of  high-speed  DFT  array 
processors.  Normally  each  input  frame  is  windowed  to 
reduce  the  noise  which  can  be  caused  by  a sharp  cut  off  at 
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the  end  of  a frame.  When  this  method  is  used  to  reduce  the 


data  rate  required  for  digital  transmission,  the  total  PFT 

of  each  frame  is  not  transmitted  because  the  total  PFT 

would  require  the  same  number  of  bits  as  the  frame  of 

samples  (assuming  both  are  quantized  to  the  same  number  of 

levels).  Reduction  in  the  data  rate  can  be  accomplished  by 

skippirg  frames  and  assuming  they  are  duplicates  of  the 

preceeding  frame  during  recons t ruct i on . The  number  of 

samples  in  the  frame  is  also  half  the  number  of  frequencies 

resolved  by  the  DFT,  therefore  the  frame  length  for 

analysis  is  choosen  as  a compromize  between  accuracy  of 

voice  reproduction  and  the  desire  for  a low  data  rate. 

This  method  of  speech  processing  would  lend  itself 

well  to  altering  the  frequency  charac te r i s t i cs  of  voice 

signals  but  it  requires  a relatively  high  data  transmission 

rate  and  therefore  was  not  desirable  for  speech  processing 

in  conjunction  with  place  to  place  communications  or  with 

digitally  stored  speech. 

2 . Homomorphic  Processing 

Another  method  which  involves  freauency  domain 

processing  is  homomorphic  processing.  It  is  based  on  the 

following  three  principles: 

(1)  Speech  is  the  convolution  of  an  excitation 
function  and  the  transfer  function  of  the  vocal 
tract . 


(2)  Convolution  in  the  time  domain  is  equivalent  to 
multiplication  in  the  frequency  domain. 

(3)  The  Fourier  transform  is  a linear 
transformat  ion,  i.e. 
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F(x(t)  + y(t) ) 


F ( x ( t ) ) ♦ F ( y ( t ) ) * X (w)  + Y ( w) 


A method  of  separating  a speech  waveform  back  into  these 
components  would  help  us  analyze  the  speech.  Homomorphic 
processing  centers  around  the  efficient  deconvolution  of 
these  signals. 

First  the  input  signal  is  wi ndowed  and  transformed 
via  the  DFT,  to  produce  the  frequency  domain  representation 
of  the  input  speech.  The  time  convolution  of  two  signals  is 
equivalent  to  multiplication  in  the  frequency  domain. 
However  knowing  the  product  of  two  waveforms  does  little 
toward  gaining  knowledge  of  the  mu  1 1 i p 1 i cands  unless 
further  information  is  given.  The  multiplication  of  the 
two  values  at  a given  frequency  is  equivalent  to  adding  the 
logarithms  of  each.  The  log  is  taken  of  each  of  the  values 
in  the  frequency  domain  representation  of  the  signal  which 
is  then  equal  to  the  sum  of  the  the  log  of  the  frequency 
domain  representat ion  of  the  excitation  function  plus  the 
the  log  of  the  frequency  domain  representation  of  the  vocal 
tract  function.  However,  it  is  easier  to  tell  the 
difference  between  the  vocal  tract  excitation  functions  in 
the  time  domain,  so  the  inverse  DFT  is  taken  of  the  log  of 
the  frequency  domain  function.  The  function  produced  is 
called  the  cepstrum  of  the  signal.  Because  taking  the 
inverse  DFT  is  a linear  function,  and  the  frequency  domain 
function  was  the  sum  of  two  component  functions,  the  time 
domain  cepstrum  must  also  be  the  sum  of  the  cepstrum  of  the 
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FIGURE  3.  HOMOMORPHIC  DECONVOLUTION 


excitation  function  and  the  cepstrum  of  the  vocal  tract 
function.  Figure  3 illustrates  the  relationship  between 
the  steps  of  homomorphic  deconvolution  of  signals. 
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Examination  of  the  cepstrum  between  2.5  and  20 
msec,  may  reveal  a peak  that  is  considerably  above  the 
background  noise  level.  If  a peak  is  there,  the  segment  is 
determined  to  be  voiced  with  the  peak  occuring  at  the  pitch 
period.  The  vocal  tract  is  not  long  enough  to  sustain  any 
vibrations  for  more  than  20  msec,  after  a pulsed  input. 

If  there  is  no  peak  the  segment  is  considered  unvoiced. 

The  cepstrum  of  the  excitation  function  may  be  subtracted 
from  the  total  cepstrum  and  the  remainder  considered  an 
estimate  of  the  cepstrum  of  the  vocal  tract  transfer 
function.  After  working  backwards  to  magnitude  (vs.  log  of 
magnitude)  in  the  frequency  domain,  the  filter  coefficients 
may  be  determined. 

It  would  be  relatively  straight  forward  to  alter 
both  the  excitation  function  and  the  vocal  tract  transfer 
function  after  the  tctal  cepstrum  is  broken  into  its 
additive  components.  However,  homomorphic  processing  was 
not  being  widely  used  for  voice  communication  and  this 
technique  was  dropped  in  favor  of  a more  widely  used 
system.  As  array  fast  Fourier  transform  processors  become 
faster  and  less  expensive,  homomorphic  speech  processing 
may  become  the  dominant  speech  communication  technique. 


i 
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C.  VOICE  TRACT  PARAMETER  TECHNIQUES  IN  THE  TIME  DOMAIN 
The  primary  characteristic  of  this  catagory  is  the 
close  tie  between  the  digital  process  and  the  physical 
structure  being  modeled.  Although  homomorphic  processing 
uses  the  deconvolution  of  the  vocal  tract  function  and  the 
excitation  function  as  a primary  tool,  the  homomorphic 
process  does  require  transformations  to  and  from  the 
frequency  domain  and  therefore  is  not  included  in  this 
catagory.  The  primary  member  of  this  catagory  is  the  linear 
prediction  coding  (LPC)  process  which  has  shown  itself  to 
be  among  the  best  and  most  versitile  of  the  various  speech 
processing  techniques. 

1 . The  Speech  Model 

The  speech  model  assumed  and  used  for  LPC  is  that 
of  a tine-varying  digital  filter  which  is  excited  by  a 
wide-band  functicn,  either  a pulsed  input  or  random  noise. 
This  is  illustrated  in  figure  4.  The  recursive  filter  used 
to  model  the  vocal  tract  is  all-pole  and  has  slowly  time 
varying  (pseudo-stationary)  coefficients.  The  filter's 
z-domain  transfer  function  is 

I 


1 lil  - P 


Y(z) 


■ U(z)  ♦ ( a . z'  )Y(z) 
i-1 

or  In  the  discrete  time-domain 

P 

Y(nT)  • U(nT)  ♦ £ a. Y( (n-l )T> 
i -1 

From  the  time  domain  equation  it  is  clear  that  the  current 
output  Y(nT)  is  uniquely  specified  in  terms  of  the  current 
input  and  the  past  p output  values. 


FIGURE  4.  SPEECH  MODEL 
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The  vocal  tract  is  not  always  best  modeled  by  an  all-pole 
filter,  and  particularly  nasal  sounds  would  probably  be 
best  modeled  by  a filter  which  also  included  zeros.  However 
there  is  considerable  difficulty  in  rapidly  estimating  both 
poles  and  zeros  of  a transfer  function  when  only  a short 
segment  of  the  output  is  available  for  analysis.  However, 
experience  has  shown  that  high  quality  voice  production  is 
possible  by  using  an  all-pole  filter  of  adequate  order. 

The  order  of  the  filter  required  is  closely  related 
to  the  length  of  the  vocal  tract.  To  adequately  represent 
the  lower  frequency  response  of  the  vocal  tract,  the  filter 
must  include  recursive  delay  equal  to  the  delay  encountered 
by  sound  waves  traveling  from  the  vocal  cords  to  the  lips 
and  returning  to  the  glottis. 

velocity  of  sound  = 344  m/sec 
length  of  vocal  tract  ■ 17  cm 

2 x 0.17  = 0.988  msec 
344 

At  a sampling  rate  of  10kHz  at  least  10  past  values  would 
need  to  be  included  for  an  accurate  model. 

The  excitation  function  for  voiced  sounds  in 
modeled  by  a train  of  pulses  at  the  glottis.  Clearly  these 
pulses  can  not  be  a perfect  set  of  impulses,  but  rather 
must  have  a finite  width  and  are  likely  to  have  a definite 
shape.  Rather  than  construct  a separate  filter  to  change 
the  impulses  into  the  correct  shape,  additional  poles  are 
added  to  the  model  so  that  the  combined  transfer  function 
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may  be  calculated  at  once.  Normally  two  additions  poles  are 
adequate  for  the  pulse  shape  model. 

2 . Linear  Predictive  Techniques 

Linear  predictive  analysis  is  based  on  the  division 
of  speech  modeling  into  modeling  of  the  excitation  function 
and  modeling  of  the  vocal  tract  transfer  function.  The 
vocal  tract  is  modeled  by  computing  each  sample  as  a 
weighted  linear  combination  of  previous  samples.  Linear 
predictive  coding  of  speech  is  accomplished  by  filtering  a 
sampled  speech  waveform  through  a filter  which  is  the 
inverse  of  the  filter  which  models  the  vocal  tract.  If  the 
filter  used  is  the  inverse  of  a good  model  of  the  vocal 
tract,  the  output  will  be  a good  approximation  of  the 
excitation  function.  The  various  properties  of  the 
excitation  function,  along  with  the  coefficients  used  in 
the  vocal  tract  filter  are  Tie  a s c r e d and  transmitted  as 
shown  in  figure  5. 
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FIGURE  5.  ENCODING  PROCESS 
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The  received  measurements  are  used  in  the  decoding 
processor  to  reconstruct  the  excitation  function  and  the 
filter.  The  process  of  reconstruct ing  the  speech  waveform 
is  shown  in  figure  6. 
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FIGURE  6.  DECODING  PROCESS 

The  primary  advantage  in  the  use  of  linear 
predictive  coding  of  speech  is  the  reduction  in  the  data 
rate  required  for  transmission  or  storage.  LPC  systems  have 
been  developed  which  require  data  rates  from  3000  to  4800 
bits  per  second  for  high  quality  voice  communication  and 
rates  as  low  as  1200  bits  per  second  have  been  reported  for 
lower  quality  but  understandable  speech  production.  Highly 
efficient  algorithms  have  been  developed  for  the  encoding 
and  decoding  of  speech  using  the  LPC  technique.  When 
hardware  implemented  with  special  purpose,  short  word 
length  microprocessors,  the  computations  reauired  for 
two-way  communication  have  been  done  in  65%  of  real  time. 

LPC  was  chosen  as  the  method  to  be  used  for 
accomplishing  the  desired  voice  character i st ic 
modifications.  A aeLailed  description  of  the  theory  and 
modeling  assumptions  follows. 
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LINEAR  PREDICTION  THEORY 


- 


Linear  prediction  Is  an  extension  of  least  squares 
estimation.  In  the  case  of  one-d I mens  I ona 1 linear 
prediction.  It  Is  more  commonly  labeled  as  time  series 
analysis  wher.  used  by  statisticians  for  analysis  of 
everything  from  population  to  the  stock  market. 


A.  THEORY 


It  Is  assumed  that  each  sample  of  the  discrete  time 
series,  s(kT),  as  shown  In  figure  7 may  be  approximated  by 
a linear  combination  of  past  samples  of  the  time  series. 


m 

s(kT)  a.  s ( ( k- 1 )T) 

1.1 

where  s(kT)  Is  the  estimated  sample  value,  a.  Is  the 

I 

coefficient  of  the  sample  I steps  past  and  m is  the  order 
of  the  approx  I ma 1 1 on  (and  as  we  will  see  later  the  order  of 
the  z-domaln  filter  of  the  model). 
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For  a portion  of  the  discrete  time  series  (N  samples  where 
N>m),  a least  squares  approximation  of  the  weighting 
coefficients,  a.,  may  be  calculated.  The  estimate  at  each 
poi  nt 


s(kT) 


s C C k- 1 )T) 

1 < k < m 


Is  subtracted  from  the  actual  sample  value  and  the  error 

for  each  estimate,  e(kT)  is  given. 

e(kT)  = s(kT)  - s(kT) 

1 i k i m 

m 


e(kT)  = s(kT) 


E- 


I =1 


s ( ( k-I ) T) 


-E 


1 2 


1 ( k < m 

To  minimize  the  error  (in  a least  squares  sense)  the  error 
Is  squared  and  summed  over  all  points  in  the  region  of 
Interest  to  obtain  an  overall  error,  E. 

N N m 

sCkT)-^ 

i 

k = l k = l 1=1 

The  derivative  of  E with  respect  to  each  of  the 
coefficients,  a..  Is  taken  and  set  equal  to  zero  in  order 
to  locate  the  minimum  of  E.  This  yields  the  following  m 
equations . 


a.  s( (k-i )T) 
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m 


dfi  • 0 - ^ [2  (s(kT)-y  a.s((k-l  )T)|  d_ 
*j  k-l  1-1 


m 

s(kT)-^a.  s(  ( k-l  )T) 
I -1 

1 i J 1 m 


however 


and 


jL  [s  ( kT ) J - 0 
da. 


A fa. s( C k-I ) T) ] - 0 , I * j 

- s ( ( k-J  ) T);  I - j 


da. 


therefore 


- 0 - ^2  |s(kT)-^ 


d a. 

j 


m 


s(kT)-\  a . s ( (k-I ) T) 


k-l  ^ 1-1 


(-1)  s((k-j  ) T ) 


1 A j 1 m 


removing  the  constant  multiplier 
N N m 

0 » y%(kT)s((k-j)T) 

k-l  k-l  I -1 


1 1*. 


s ( ( k- I )T)s((k-j  ) T) 

1 i j 1 n 


changing  the  order  of  summation 

N m N 

s(kT)s(  (k-j  )T)  - y^a.  s ( ( k-I  )T)  s(  ( k-J  ) T) 
k-l  1-1  k-l 


ii 

I 


1 j m 

Given  all  of  the  samples  within  the  summations  over  N, 
the  above  set  of  m equations  In  the  m unknowns,  a.,  can  be 
solved.  If  only  the  samples 
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r-  . ,P.«W»WM 


s(kT)  1 1 k 1 N 

are  given,  the  set  of  equations  above  can  not  be  solved 
because  of  the  requirement  to  know  the  samples 

s((l-j )T)  1 1 j < m 

However  by  windowing  the  samples  so  that  all  samples 
outside  the  region  of  Interest  are  zero 

s(kT)  *0  k < 0 and  k > N 

the  summations  over  N in  the  set  of  equations  above  may  be 
replaced  by  the  autocorrelation  of  the  windowed  samples, 
s'(kT). 


R(  j ) 


Z 

L-  =1 


s ' ( kT)  s ' ( ( k+j  )T) 


0 1 j 1 m 

This  assumption  may  be  made  because  the  number  of  samples, 
N,  is  normally  much  greater  than  the  order,  m,  of  the  set 
of  equations.  Therefore  relatively  few  samples  are  lost. 
The  window  function  used  will  not  significantly  alter  the 
samples  In  the  center  of  the  frame,  and  therefore  the 
resulting  coefficients  will  be  a correct  approximation  for 
that  segment.  The  set  of  linear  equations  may  now  be 
wr i tten 


m 


R(J  ) 


Rd-j) 


i -1 


1 1 j 1 m 

These  equations  may  now  be  solved  for  the  linear  predictive 
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coefficients#  a.,  1 i I 1 m. 

If  the  system  being  studied  Is  stationary  or  we  are 
only  considering  a pseudo-stationary  segment  of  the  system 
output#  and  If  the  order  of  the  model  Is  sufficiently  close 
to  the  order  of  the  real  system#  future  values  of  the 
variable  may  be  calculated  recursively  from  previous 
values.  In  the  following  section  we  will  see  how  this 
theory  is  applied  to  speech  modeling  and  recons truct 1 on. 


B.  LINEAR  PREDICTIVE  CODING  FOR  VOICE  ANALYSIS 

The  digital  model  used  for  speech  synthesis  Is  shown  In 
figure  8.  The  discrete  time  excitation  function  Is  e(nT) 
and  the  synthesized  speech  output  Is  s(nT). 
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FIGURE  8.  SPEECH  SYNTHESIS  MODEL 


The  vocal  tract  filter  is  assumed  to  be  all-pole  and 
therefore  can  be  represented  by  the  z-domain  equation 


H(z)  - S(z)  « zl 

E(  z)  m 

T T Cz-p.  ) 

I -1 

Multiplying  out  the  denominator  and  dividing  both  numerator 
and  denominator  by  zm  yields. 


1 


! 
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H(z)  =»  S(z) 


E(z) 


This  z-domaln  equation  Is  converted  to  a discrete  time 
domain  equation  as  follows 

m 

S ( z ) ( 1-^a.z"'  ) = E(z) 
l *1 


S ( z ) 


m 

E(z)  a.  z1 
l =1 


S(z) 


m 


s(nT) 


e(  nT) 


a.  s( ( n-i )T) 


1 =1 

If  the  excitation  function  e(nT)  equals  zero  for  a given 
sample,  then  this  equation  Is  similar  to  the  first  equation 
in  the  previous  section  on  the  theory  of  linear  prediction. 
The  coefficients  of  the  z-domain  filter  transfer  function 
are  equivalent  to  the  linear  prediction  wteghting 
coeff i cl ents . 

Analysts  of  the  sampled  speech  waveform  is  used  to 
calculate  the  prediction  coefficients  which  are  then  used 
in  an  inverse  filter  to  determine  the  excitation  function 
from  the  input  speech.  This  inverse  filter  may  be 
represented  as 


" ^ 


i 
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r 


or  as 


SlJSl  = 
S(z) 


E(nT)  - S(nT) 


s ( ( n-i )T) 


and  Is  construted  as  shown  In  figure  9. 


s(nT)  +/^E:(NiT) 


The  Input  speech  has  been  broken  Into  vocal  tract 
characteristics  determined  by  the  prediction  coefficients 
and  excitation  signal  characteristics  which  remain  to  be 
determined.  During  the  encoding  process  the  output  of  the 
Inverse  filter  may  also  be  considered  an  error  signal 
because  It  is  the  difference  between  the  actual  speech 
sample  and  the  predicted  speech  sample. 

During  voiced  speech  the  vocal  tract  filter  in  figure  9 
acts  as  a model  for  the  total  transfer  function  which  is 
due  to  the  glottal  pulse  shape,  the  actual  vocal  tract 
shape  and  the  output  reflection  at  the  lips.  Idealy  during 
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voiced  speech  all  of  these  effects  are  removed  by  the 
Inverse  filter  and  the  error  function  Is  a train  of 
Impulses  at  the  pitch  frequency. 

During  unvoiced  speech  the  physical  excitation  function 
Is  a pseudo-random  air  pressure  variation  caused  by 
turbulence  at  a constriction  somewhere  along  the  vocal 
tract.  This  wlde-band  source  Is  filtered  by  the  portion  of 
the  vocal  tract  between  the  constriction  and  the  lips.  This 
portion  of  the  vocal  tract  will  resonate  at  certlan 
characteristic  frequencies  but  normally  the  number  of  peaks 
In  the  frequency  domain  response  will  be  fewer  than  for 
voiced  sounds  because  of  the  shorter  segment  of  the  vocal 
tract  In  use.  During  encoding  of  unvoiced  speech  the  output 
of  the  Inverse  filter  Is  pseudo- random  because  the  Inverse 
filter  can't  predict  the  output  due  to  the  random  Input. 

The  speech  model  Is  not  complete  with  just  the 
determination  of  the  coefficients  of  the  vocal  tract 
filter.  During  speech  recons truct I on  it  is  necessary  to 
know: 

(1)  Which  excitation  signal,  pulses  or  noise,  to 

use. 

C 2 ) Excitation  pulse  period  for  voiced  sounds. 

(3)  The  gain  multiplication  factor. 

Although  these  quantities  are  not  necessarily  determined  \\ 

using  linear  prediction  theory,  they  are  none  the  less 
required  for  a working  speech  encoding/decoding  system. 

During  encoding,  the  marked  difference  In  the  error 
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signal  for  voiced  and  unvoiced  speech  can  be  used  as  the 
basis  for  the  vot ced/unvoi ced  decision.  The  energy  of  the 
error  signal  for  voiced  speech  should  be  rather  small  In 
comparison  to  the  energy  of  the  Input  samples.  On  the  other 
hand,  during  unvoiced  speech  the  prediction  Is  poor  and 
most  of  the  energy  remains  after  filtering.  The  ratio  of 
the  average  energy  or  root-mean-square  value  of  the  speech 
samples  to  the  similar  quantity  of  the  error  signal  can  be 
used  to  make  the  voiced/unvoiced  decission.  This  ratio  is 
compared  to  an  empirically  determined  threshold  and  the 
segment  is  considered  voiced  whenever  the  ratio  is  greater 
than  the  threshold. 

The  gain  used  during  reconstruction  Is  the  amplitude 
multiplier  of  the  excitation  signal  at  the  input  of  the 
vocal  tract  filter.  The  gain  used  during  unvoiced  speech 
may  be  simply  the  root-mean-square  of  the  error  signal. 

This  gain  coefficient  Is  multiplied  by  the  output  of  a 
random  number  generator  which  produces  normally  distributed 
numbers  with  a root-mean-square  value  of  unity. 

\ 

The  gain  of  voiced  speech  may  also  be  determined  from 
the  root-mean-square  value  of  the  error  signal.  However 
during  reconstruction  of  voiced  speech  the  entire  energy  of 
the  excitation  signal  is  concentrated  in  a series  of 
impulses  which  should  have  the  same  root-mean-squar e value. 

The  root-mean-square  value  of  a series  of  discrete-time 
impulses  with  amplitude,  a,  and  a period,  p,  intervals  is 
approximated  by 
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2]  1/2 

a 

J N>>p 


-i/2 

rms  * a p 

The  output  of  a unit  Impulse  generator  should  then  be 
multiplied  by 


1/2 

G 3 rms  p 

to  Insure  that  the  same  energy  Is  input  to  the  vocal  tract 
filter  as  was  output  by  the  filter  during  encoding.  The 
above  method  for  calculating  the  gain  needed  during 
reconstruction  Is  based  on  the  assumption  that  the 
prediction  error  for  voiced  speech  Is  caused  entirely  by 
the  physical  excitation  function  of  the  speaker.  However 
the  prediction  error  may  be  increased  because  the  vocal 
tract  was  changing  shape  rapidly  during  the  analysis  frame 
or  because  of  background  noise  at  the  microphone  which 
would  not  be  removed  by  the  inverse  filter.  Either  of  these 
would  cause  an  unwanted  gain  increase  during 
reconstruction.  A typical  voiced  speech  waveform  and  the 
error  signal  generated  from  it  are  shown  in  figure  10. 
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(B)  ERROR  SIGNAL 
WAVEFORM 

FIGURE  10  . 

The  reliable  det ermt na t i on  of  the  oi  tch  period  of 
voiced  speech  Is  a problem  for  which  the  Ideal  solution  ts 
still  undetermined.  The  periodic  increase  In  the  amplitude 
of  the  error  signal  at  the  pitch  period  is  shown  in  figure 
10(b)  and  suggests  the  use  of  the  error  signal  in  pitch 
period  determination.  A number  of  algorithms  exist  for 
det erml nat I on  of  the  pitch  period  which  generally  involve 
various  combinations  cf  the  following  processes. 

(1)  Raising  the  error  signal  to  a given  power. 

(2)  Low-pass  filtering  of  the  error  signal. 

(3)  Windowing  the  error  signal. 

(4)  Calculating  the  autocor rel at  I on  function  of  the 

filtered  error  signal. 

(5)  Picking  the  peaks  of  the  autocorrel at  I cn 

funct  ton. 

Experience  has  shown  that  pitch  determination  Is 
computationally  as  difficult  as  the  LPC  parameter 


determination  and  the  literature  on  the  subject  Illustrates 
the  trade-off  between  hardware,  software,  computation  time 
and  reliability  from  method  to  method. 
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C.  L PC  COMMUNICATION  SYSTEMS 

A review  of  existing  LPC  communication  hardware  is 
useful  because  any  method  which  alters  formant  and  pitch 
characteristics  of  speech  will  be  most  successful  If  it  Is 
compa table  with  these  systems. 

Currently  off-the-shelf  mi croprocessors  are  not  fast 
enough  to  handle  the  algorithms  described  in  real-time. 

However  special  purpose  units  which  are  designed  along 

computer  lines,  do  meet  the  real-time  criteria.  On  the 

surface  the  word  'computer'  might  not  seam  to  fit  these 

special  purpose  machines,  but  a closer  look  will  reveal 

that  each  has  components  which  are  the  same  as  those  of  a 

computer:  stored  programming,  memory,  input,  output,  an 

arithmetic  logic  unit  (ALU),  an  instruction  set,  and 

control  components.  Two  processors  which  were  developed  at 

MIT's  Lincoln  Laboratory  will  be  used  to  illustrate  the 

state  of  the  art  in  LPC  voice  terminals  and  certain 

similarities  in  their  architecture  will  be  evident.  The 

first  processor  is  the  more  flexible  of  the  two  and  is 

designed  to  handle  a wider  varity  of  algorithms.  The  second 

was  developed  about  a year  later  and  was  designed  \ 

specifically  for  LPC  algorithms  with  cnly  minor  changes. 

The  first  processor  to  be  covered  is  the  Lincoln 
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Digital  Voice  Terminal  (LCVT)  which  was  designed  and 
constructed  at  the  Lincoln  Laboratory  during  the  1973-75 
time  frame.  This  processor  Is  capable  of  carrying  out  18 
million  basic  Instructions  per  second  with  a 16-bit  by 
15-bit  multiplication  taking  four  times  as  long.  The 
execution  time  for  each  Instruction  Is  165  nsec,  which 
seems  to  conflict  with  the  instruction  rate.  This  is 
resolved  by  the  pipelining  of  the  three  portions  of  each 
basic  instruction:  fetch,  decode,  and  execute.  The 
processor  has  separate  memories  for  data  and  the  program. 
The  data  memory  capacity  is  512  16-bit  worcs  and  the 
program  memory  contains  1024  16-oit  instructions.  The 
pipeline  Instruction  processing  requires  that  the  buses  to 
and  from  the  ALU  be  seperate  and  each  is  uni di recti ona 1 . 

Figure  11  shows  the  data  paths  of  the  LDVT  (none  of  the 
control  or  timing  lines  are  shown).  There  are  four  active 
registers:  the  P register  which  is  the  program  counter  with 
multiplexed  inputs  from  the  address  portion  of  the 
instruction,  the  ALU,  the  sum  of  the  X register  and  the 
address  portion  of  the  instruction,  and  itself  Incremented 
by  one;  the  X register  which  is  used  for  indexing  memory 
addresses;  the  A register  which  is  the  accumulator;  and  the 
3 register  which  is  actually  a pair  of  registers  used  for 
i nput  and  output. 
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FIGURE  11.  LDVT  DATA  FLOW 

The  ALU  of  the  LDVT  as  shown  separately  in  figure  12, 
has  two  sections:  a standard  programmable  ALU  which 
performs  logical,  addition  anc  compare  operations;  and  a 
16-bit  by  16-bit  multiplier  array  which  provides  a 32-bit 
result  In  just  4 cycles.  Either  of  these  may  be  used  with 
any  input,  however  due  to  their  common  input  and  output 
only  one  may  be  used  at  a time. 

It  Is  significant  to  note  some  of  the  requirements 
brought  on  by  the  pipelining  of  the  instructions.  The 
device  does  not  have  a main  bus  over  which  data  flows  in 
both  directions.  Generally  all  data  flow  Is  unidirectional 
and  in  the  case  of  the  ALU  input  buffer  registers  are 
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needed  to  hold  the  data  for  the  Instruction  being  executed 
while  the  next  Instruction  may  have  already  read  a value 
from  memory  and  put  this  on  the  ALU  Input  line.  In  addition 
to  LPC  algorithms  at  2400,  5600  and  4800  bits  per  second, 
the  LDVT  has  been  programmed  for  adaptive  predictive  coding 
at  3000  bits  per  second  and  as  a channel  vocoder  at  2400 
b I ts  per  second . 


Source  Memory 

Register 

FIGURE  12.  LDVT  ALU 

The  second  speech  processor  Is  the  Linear  Predictive 
Coding  Microprocessor  (LPCM)  which  Is  dlsigned  strictly  as 
a low  cost  LPC  terminal.  The  basic  cycle  time  for  this 
machine  Is  150  nsec.  The  data  memory  has  2 K 16-bit  words  of 
which  1.5K  Is  ROM  and  J.5K  Is  RAM.  The  program  memory 
contains  IK  of  48-blt  words.  The  LPCM  Is  almost  free  of 
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Instruction  decoding,  with  the  only  exception  being  the  ALU 
operation.  Figure  13  shows  the  Instruction  format  and  In 
figure  14  It  Is  evident  that  parts  of  the  Instruction 
register  are  being  Input  as  control  functions.  Figure  15  Is 
a block  diagram  of  the  LPCM  and  shows  the  two  buses  and  the 
large  number  of  registers  needed  to  control  the  data  flow. 

While  these  machines  have  varying  degrees  of 
adaptability.  It  does  not  appear  that  either  could  handle 
the  additional  computations  described  In  the  following 
sections  v/I  thout  major  hardware  modifications.  However,  a 
special  purpose  LPC  code  converter  which  could  oe  used  In 
conjunction  with  an  existing  terminal  could  probably  be 
developed  which  would  operate  tn  real -tine  and  not  load  the 
existing  processor. 
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LPCM  INSTRUCTION  FORMAT 
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FIGURE  14.  LPCM  CENTRAL  PROCESSOR 


V.  ADJUSTMENT  OF  VOCAL  TRACT  PARAMETERS  USING  LPC 


,1 
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One  reference  to  voice  cha racte r I st I c modification  was 
found  by  the  author  |Atal  and  Hauneur/  1 9 71 J . Although 
scaling  of  pitch,  formant  frequency  and  formant  bandwidth 
was  stated  to  have  been  accomplished,  no  description  of  the 
work  was  given.  Other  literature  did  provide  useful 
Information  on  formant  frequencies  and  pitch  periods  which 
are  typical  for  various  speakers.  It  should  be  noted  that 
there  is  a considerably  larger  variation,  from  speaker  to 
speaker.  In  pitch  period  than  In  formant  frequencies.  As  an 
example,  two  speakers,  saying  the  same  phoneme  could  easily 
have  pitch  periods  that  varied  by  a factor  of  two,  yet  have 
only  a 10-20  per  cent  variation  In  formant  frequencies. 

Different  physical  structure  (vocal  cords  and  the  vocal 
tract)  produce  these  speech  characteristics  (pitch  period 
and  formant  frequencies,  respectively)  and  therefore  their 
variation  from  speaker  to  speaker  Is  only  partially 
cor rel ated . 

The  coded  Information  produced  from  Input  voice  by  the 
LPC  processor  Is  very  closely  related  to  the  physical 
structure  that  Is  producing  the  sound.  On  output,  speech  Is 
reconstructed  from  the  gain,  pitch  period  and 

i 

voice/ unvoiced  parameters  as  well  as  the  vocal  tract 
prediction  coefficients.  The  gain  and  pitch  period  can  be 
varied  as  they  stand  but  the  variation  of  the  prediction 
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coefficients  Is  somewhat  more  complicated.  The  goal  of 
varying  these  coefficients  before  reconstruction  Is  to  have 
the  output  voice  have  different  pitch  period  and  formant 
frequencies  while  retaining  a natural  sound  and  retaining 
the  same  Information/  l.e.  the  same  sequence  of  phonemes 
and  voice  Inflection. 

Voice  characteristics  are  associated  with  certain 
parameters  of  the  LPC  code.  First/  formant  frequencies  and 
bandwldths  are  associated  with  the  LPC  coefficients.  The 
amplitude  of  the  output  voice  Is  associated  with  both  the 
gain  coefficient  and  the  formant  bandwidths.  The 
relationship  between  output  amplitude  and  the  formant 
bandwidth  Is  due  to  the  Increased  energy  In  the  Impulse 
response  of  a narrow  bandwidth  (high  Q)  transfer  function. 
This  Is  noted  physically  by  the  fact  that  speakers  with 
highly  resonant  voices  may  speak  louder  for  the  same  amount 
of  energy  expended.  The  pitch  period  Is  controled  by  the 
pitch  period  coefficient  only.  Finally,  the  vo i ce/unvo I ced 
dectsslon  would  normally  not  be  changed.  The  exception 
would  be  If  one  was  reconstructing  whispered  speech  (the 
vocal  cords  are  stationary)  from  normal  speech. 

A.  ADJUSTMENT  OF  FORMANT  FREQUENCY  AND  BANDWIDTH 

The  vocal  tract  model  we  are  using  has  all  real 
coefficients  In  the  z-domain  polynomial.  Following  directly 
from  this  Is  the  fact  that  all  poles  must  fall  either  on 
the  real  axis  of  the  z-plane  or  In  complex  conjugate  pairs. 
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Each  of  the  complex  conjugate  pairs  Is  associated  with  one 
formant  (resonator)  of  the  speech  model.  The  vocal  tract 
transfer  functton  Is  the  product  of  these  resonator 
transfer  functions  which  are  each  of  the  following  form 

1 

H (z)  - -2TKBW)  T,  -1  -4TKBW)  T,  -2 

f l-2e  cos(2TTF  T,  ) z ♦ e z 

where  F Is  the  center  frequency  of  the  formant,  f , and  BW 

Is  the  bandwidth  of  the  formant.  The  pole  locations 

associated  with  this  transfer  function  are 

z * x i Jy 

This  pair  of  poles  must  be  moved  In  order  to  alter  the 
frequency  and  bandwidth  of  this  resonant  section  of  the 
vocal  tract  model,  but  this  must  be  done  carefully  so  that 
the  poles  remain  Inside  the  z-plane  unit  circle.  If  the 
desired  modification  of  the  Input  speech  Is  to  reduce  the 
bandwidth  (increase  Q)  of  the  formants,  the  poles  must  be 
moved  closer  to  the  unit  circle.  If  the  distance  from  the 
center  Is  multiplied  by  a constant  factor,  there  Is  a 
danger  of  moving  poles  outside  the  unit  circle  and  thereby 
causing  Instability  during  reconstruction.  However,  the 
magnitude  of  the  pole  Is  always  less  than  one  and  mav  be 
raised  to  any  positive  power  without  danger  of  crossing  the 
unit  circle.  It  Is  shown  as  follows  that  raising  the 
magnitude  to  a factor  Is  equivalent  to  multiplying  the 
formant  bandwidth  by  that  same  factor. 

The  transfer  functton  with  the  complex  conjugate  poles 
above  Is: 
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H(z)  * -1  22-2 

l-2x  Z + ( X +y  ) z 

However  with  the  pole  locations  In  polar  form 

x ■ A cos  0 Y ■ A si n 9 
and  making  use  of 

2 2 
cos  9+s l n 9*1 


the  eauatlons  becomes 


1 

H(  z 1 = -1  2 -2. 

1-2A  cos 9 z +A  z 

Setting  the  terms  of  the  characteristic  aquations  equal  we 
get 

-2TT  CBW)  Ts 

2A  cos9  3 2e  cos(2TTF  Ty  ) 

and 

2 -4  IT  ( BW)  T. 

A * e 

when  solved  for  A and  9 give 

-2TT  ( BW)  Ts 
A * e 

9 * 2TT  F Ts 

and  Inversely 

F * 0 / 2TT  T, 

BW-  - (-In  A ) / 2TT  T5 

If  new  formant  characteristics,  F*  and  BW',  are  desired 
where 

F'  * J? 

and 


BW'  =•  OCBW 


they  may  be  Implemented  by  moving  the  poles  of  the 
characteristic  equation  so  that 

9 ' = 79 

and 

In  A ' 3 CX 1 n A 

which  reduced  to 

, OL 

A 3 A 

This  method  of  Implementing  the  pole  shifts  guarantees 
that  no  unstable  poles  will  be  created  and  Is  used  In  the 
following  section  In  the  realization  of  a LPC  voice 
modification  system. 

3.  GAIN  ADJUSTMENT 

The  filter  coefficients  recons tructed  from  the 
relocated  poles  above  may  not  have  the  same  zero  frequency 
gain  characteristic  as  the  filter  used  for  Inverse 
filtering  during  encoding.  This  situation  can  be 
illustrated  graphically  by  the  two  vocal  tract  transmission 
characteristics  shown  in  figure  16. 


FIGURE  16.  FORMANT  GAIN 
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Although  the  formant  frequencies  In  15(b)  are  lower  than 
the  corresponding  frequencies  In  16(a)  as  was  desired,  the 
overall  gain  was  also  changed.  This  would  cause  the 
recons tructed  speech  co  be  much  softer  than  desired. 

A solution  to  this  problem  was  to  adjust  the  excitation 
function  gain  used  during  recons truct i on.  This  adjustment 
factor  would  be  equal  to  the  ratio  of  the  zero  frequency 
gains  of  the  original  and  modified  vocal  tract  filters.  The 
vocal  tract  has  the  following  z-domaln  transfer  function. 

l 


H(z) 


1+ 


H • 


1 *1 

The  above  equation  can  be  evaluated  at 

jTT  f/fs 

z = e 

to  obtain  the  gain  at  frequency  f.  Evaluating  the  above 
transfer  function  at  f=C  yields  the  following  equations. 


and 


G ( Q ) = p 

i =i 

This  equation  can  be  easily  evaluated  for  both  the 
coefficients  of  the  vocal  tract  transfer  function 
calculated  from  the  input  sequence  and  the  coefficients 
calculated  from  the  altered  pole  locations.  The  gain 
multiplication  factor  is  then  multiplied  by  the  energy 
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measured  In  the  error  signal  to  get  the  excitation  gain  to 
be  used  during  reconstruction. 


C.  PITCH  PERIOD  ADJUSTMENT 

The  adjustment  of  the  measured  pitch  period  may  almost 
go  without  explanation  except  to  note  that  If  the  pitch 
period  Is  Increased  and  all  other  coefficients  remain 
unchanged,  the  output  speech  would  be  softer.  This  is  due 
to  the  reduced  energy  (Impulses  less  often)  being  input  to 
the  vocal  tract  filter  and  the  resulting  lower  energy  In 
the  output  speech. 


VI.  COMPUTER  SIMULATION  OF  PITCH  AND  FORMANT  MODIFICATION 


The  process  of  pitch  and  formant  modification  was 
carried  out  on  the  IBM  360  computer  with  the  Input  and 
output  being  accomplished  on  a hybrid  system  consisting  of 
a COMCOR  5000  analog  computer  and  an  XDS  9300  digital 
computer.  The  Interface  between  the  XDS  9300  and  the  I 3M 
360  was  seven  track  digital  magnetic  tape.  All  work  was 
done  on  five  second  segments  to  allow  sufficient  length  for 
analysis  while  not  using  excessive  computer  processing 
time. 

A.  VOICE  INPUT  AND  DIGITAL  SAMPLING 

The  Input  voice  was  recorded  on  a standard  single  tract 
audio  tape  recorder  at  7 1/2  Inches  per  second  (Ips). 
Recording  was  done  with  a high  quality  microphone  In  a 
quiet  but  not  sound-proof  room.  This  digitizing  was  done  at 
half  speed  to  allow  the  digital  computer  to  write  the  data 
onto  tape  without  missing  any  data.  This  recording  was 
played  back  at  3 3/4  ios  with  the  output  directed  to  an 
amplifier  of  the  analog  computer.  The  voice  was  amplified 
to  a level  appropriate  for  the  analog  computer  (a  +.100  volt 
machine).  The  amoli^ier  output  was  passed  through  two 
forth-order  analog  filters  set  at  2350  Hz  and  2400  Hz  cut 
off  freauencies.  The  output  of  the  filters  was  then  put 
into  a sample  and  hold  circuit  at  the  input  of  a 14-bit 
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analog  to  digital  converter.  The  14  bits  produced  were 
read  by  the  XDS  9300  and  placed  In  the  most  significant 
bits  of  the  24  bit  XDS  9300  computer  word.  This  process  Is 
Illustrated  In  figure  17. 


FIGURE  17.  OATA  ACQUISITION 


The  samol Ing  rate  used  was  5000  Hz.  However  the  voice 
recording  was  play*d  hack  at  half  spend  and  therefore  the 
equivalent  lownass  filter  cut  off  and  the  equivalent 
sampling  rate  wer»  about  4750  and  10,000  Hz  respect  I vel y . 


D.  XDS  9300  OPFR4TIDM 

The  operation  of  th*»  XDS  9300  during  the  Input  phase 
was  simply  to  read  the  data  available  at  the  output  of  the 
analog  to  digital  converter  and  place  this  data  In  an 
array.  When  an  array  of  1024  samples  was  filled  It  was 
written  onto  a seven  track  magnetic  tape.  This  was  done  , 

continuously  so  that  no  data  was  lost  between  blocks.  The 
voice  segment  as  It  existed  on  the  seven  track  tape 
consisted  of  50  blocks  of  1024  samples.  Fach  sample  was 
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recorded  In  a Integer  format  ranging  from  +8388607  to 
-8388607  (+.( 2**23  )-l ) . This  tape  was  then  used  as  the  Input 
to  the  IBM  360. 


C.  IBM  360  INPUT  PR E PA RAT  I ON 

When  the  24-btt  word,  seven  track  tape  created  by  the 
XDS  9300  was  read  by  the  IBM  360,  the  machine 
representation  of  the  values  was  not  correct.  This  was  due 
to  the  addition  of  the  eight  bits  shown  In  figure  18, 


24-Bit  XDS  9300  Word 

32-Bit  Word  Read  by 
IBM  360 


Corrected  IBM  360  Word 

FIGURE  18. 


The  data  conversion  program  (Appendix  A.l)  was  used  to  read 
the  data  from  the  seven  track  tape  and  move  the  bits  of 
each  value  as  required.  The  program  did  not  make  the 
conversion  from  ones  complement  representat ion  (XDS  9300) 
to  twos  complement  reo resentat I on  (IBM  360)  because  any 
error  caused  would  be  well  below  the  14-bit  quantization 
error.  At  this  point  the  data  was  converted  to  floating 
point  rep  re  sen  tat  l on  with  values  between  _+100.0  and  the 
average  value  of  each  sequence  was  calculated  and 
subtracted  from  each  data  point.  This  Insured  that  the 
Input  was  a zero  mean  function.  Each  data  sequence  was 


written  Into  a separate  file  of  a standard  nine  track  I 3M 
360  tape  for  ease  of  further  hand! Ing. 

D.  SCOPE  OF  SIMULATION  PROGRAM 

The  goal  of  this  research  was  to  demonstrate  the 
feasibility  of  voice  modification  and  as  a result  only 
certain  areas  were  studied.  Specifically,  all  programming 
was  done  with  the  standard  IBM  360  fl  oat  Ing-polnt 
arithmetic,  making  no  allowance  for  the  effects  which  would 
be  caused  by  the  shorter  word  length  and  Integer 
representat  Ion  used  fn  most  voice  processing  systems. 
Further  study  of  that  area  Is  warranted  and  would  be 
especially  critical  In  the  determination  of  the  pole 
location,  which  Is  covered  later. 

The  system  degradation  by  background  noise  In  the  Input 
speech  was  not  studied  except  to  note  that  the 
vo Iced/ unvol ced  declfon  threshold  would  need  to  be  adjusted 
for  a noise  environment. 

Although  the  programs  were  written  to  allow  variation 
In  the  order  of  the  prediction,  number  of  samples  per  frame 
and  sampling  Interval,  these  were  not  varied.  A 12th  order 
voice  tract  filter  was  used  throughout  and  proved  to  be 
satisfactory.  The  analysis  frame  length  was  25.6  msec. 

(256  samoles)  and  also  remained  unchanged.  In  any  future 
use  of  these  programs  with  a different  frame  length, 
attention  would  be  required  by  the  Input  format  to  Insure 
that  the  analysis  frame  length  Is  an  Integral  multiple  of 
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the  input  record  length. 


Finally/  in  the  following  description  of  the  programs 
the  term  'LPC  coefficients'  will  refer  to  the  coefficients 
of  the  vocal  tract  model  filter.  The  term  'LPC  parameters' 
will  refer  to  the  entire  set  of  parameters  needed  to 
reconstruct  the  output  speech,  i.e.  the  LPC  parameters 
consist  of  the  LPC  coefficients,  the  gain  parameter,  the 
pitch  period  and  the  voicing  indicator. 

E.  LPC  ENCODING 

The  first  step  of  the  encoding  process  was  to  determine 
the  filter  coefficients.  These  coefficients  were  used  in 
the  inverse  filter  for  determination  of  the  error  signal. 
The  root  mean  square  values  of  the  input  and  error  signals 
were  compared  to  determine  if  the  frame  was  voiced  or 
unvoiced.  Finally  the  pitch  period  was  determined  for 
voiced  frames.  This  program  is  listed  in  Appendix  A. 2. 

1.  LPC  Coefficient  Determi nat ion 

Determination  of  the  LPC  coefficients  was  done  with 
the  autocorrelation  method  in  the  subroutine  named  AUTO. 
First,  the  input  data,  s(n),  was  windowed  by  one  of  four 
available  windows  producing  a temporary  array,  t(n),  of  the 
windowed  data. 

t(n)  * W(n)  x s(n) 

The  discrete  autocorrel at  ion  of  the  temporary  array  was 
calculated  for  the  discrete  displacements  of  zero  to  the 
predictor  order,  p. 


R(j  ) 


N-j 


t(  i+j) 

0 i i 1 P 


The  next  step  was  the  solution  of  the  following  matrix 
equat I on . 


RC I t-j  I) 


a . 

j 


R(l) 

1 1 I 1 P 


The  auto  correlation  matrix  In  always  positive  definate, 
symetric  and  all  values  along  a given  diagonal  are  equal. 

A particularly  efficient  method  of  solution  is  available. 
This  method  is  attributed  to  Durbin  IMakhoul,  19751  and  is 
implemented  In  subroutine  COEFF.  Durbin's  algorithm  is 
recursive  and  calculates  the  predictor  coefficients  for  the 
Kth  order  from  the  coefficients  for  the  (k-l)th  order.  The 
j th  coefficient  for  the  kth  order  predictor  is  a j ( k ) . The 
recursion  formulas  follow. 


ECO)  = R ( 0 ) 


a . ( k) 
J 


a . ( k) 

J 


j-1 

R C j ) - ^a.(j-l)  R(j-i) 
i-i 


/ E( k-l ) 


1 1 j i.  P 

a. (k-l)  - a,  (k)  a .(k-l) 
j k k-j 

1 1 j 1 (k-l) 
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2 

E(k)  - (l-aK(k)  ) E(k-l) 

E(k)  Is  the  prediction  order  error  resulting  from  limiting 
the  predictor  order  to  k. 

During  the  programming  of  COEFF  the  subroutine  TEST 
was  written  to  perform  and  print  the  results  of  the  matrix 
multiplication.  During  the  initial  testing  of  the  program 
various  window  functions  were  used  in  AUTO,  however  the 
prediction  order  error  did  not  change  significantly  with 
the  window  function  used. 

Certain  researchers  have  noted  that  a lower  order 
filter  may  be  used  during  unvoiced  speech.  If  this  is 
desired,  the  coefficients  for  the  lower  order  filters  could 
be  stored  during  the  recursive  steps  of  the  algorithm  above 
and  later,  when  the  frame  is  determined  to  be  unvoiced,  the 
lower  order  filter  coefficients  would  be  available  without 
further  calculation. 

The  coefficients,  a.,  used  in  the  main  program  are 
the  coefficients  of  the  characteristic  polynomial  of  the 
filter  with  a assumed  to  be  unity. 

1 

H(z)  « p 

a.z* 

i 

i =0 

Therefore  the  negitive  of  the  values  calculated  in  COEFF 
were  returned  to  the  main  program. 
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2 . Error  Signal  Determination 

The  error  signal,  e(n),  Is  determined  by 
subtracting  the  predicted  sample  value,  's(n)  from  the 
actual  value,  sCn). 

e(n)  ■ s(n)  - s(n) 

P 


s(n) 


-Z  a. 


s ( n- 1 ) 


1-1 


e( n ) * s C n ) 


+ X,  a*  S(n- 


I ) 


1-1 


This  operation  is  carried  out  by  subroutine  ERR.  In  order 
to  make  a correct  error  determination  at  the  begining  of 
each  frame,  a number  of  samples  equal  to  the  order  of  the 
predictor  were  saved  from  the  end  of  the  previous  frame. 
This  eliminated  additional  error  s i gna 1 energy  caused  by 
poor  begining  of  frame  prediction  and  reduced  the 
possibility  of  an  incorrect  voicing  decision.  Another 
possible  solution  to  this  problem  would  be  just  not 
analyzing  the  error  for  the  first  few  samples  of  each  frame 
and  making  the  appropriate  changes  in  the  following 
routines  that  use  the  error  signal. 

3 . Voicing  Decision 

A comparison  of  input  signal  energy  and  the  error 
signal  energy  was  used  to  determine  if  a particular  frame 
is  voiced  or  unvoiced.  Although  the  root  mean  square  value 
of  each  set  of  data  is  actually  proportional  to  the  square 
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root  of  the  energy  in  the  signal,  the  root  mean  square 
value  was  used  in  this  comparison.  Whenever  the  root  mean 
square  value  of  the  Input  signal  divided  by  the  root  mean 
square  value  of  the  error  signal  was  greater  than  a 
threshold  value,  the  frame  was  determined  to  be  voiced  and 
the  voicing  indicator  was  set  to  one.  Otherwise  the  voicing 
indicator  was  set  to  zero. 

4 . Pitch  Period  Determination 

The  error  signal  was  used  in  subroutine  PITCH  for 
determination  of  the  pitch  period  of  each  voiced  frame. 
First  the  error  signal  was  passed  through  a recursive  5th 
order  Butterworth  filter  with  an  800Hz  cut  off,  to  smooth 
the  signal.  Extra  samples  of  the  error  signal  and  filtered 
error  signal  were  saved  from  frame  to  frame  (zeroed  during 
unvoiced  frames ) to  insure  a correct  filtered  error  signal 
at  the  begining  of  each  frame.  The  degradation  of  the 
system  if  this  was  not  done  was  negligible  but  plots  of  the 
filtered  error  signal  would  have  shown  discontinuities  at 
the  begining  of  each  frame  if  this  had  not  been  done.  The 
frame  was  windowed  to  eliminate  end  effects  and  the 
autocorrelation  function  of  the  filtered  error  signal  is 
calculated.  The  portion  of  the  autocorrelation  function 
from  12  to  180  samples  was  searched  for  peak  values  and  the 
pitch  period  set  equal  to  the  location  of  this  peak. 

Figure  19  shows  a typical  autocorrelation  function  and  the 
portion  of  the  curve  searched  for  the  peak  value.  The  peak 
picking  algorithm  checked  to  insure  that  the  value  chosen 
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was  not  on  the  downslope  of  the  center  peak  and  was  not  a 
minor  peak  with  a larger  peak  at  a longer  pitch  period. 


Although  this  pitch  determi nat ion  algorithm  worked 
satisfactorily  in  this  program  it  is  probably  not  as 
accurate  and  flexible  as  certain  other,  more  complicated 
techniques  available.  It  was  used  only  for  pitch  periods 
from  about  3 to  9 msec.,  but  was  satisfactory  for  them. 

F.  LPC  PARAMETER  MODIFICATION 

The  purpose  of  the  program  was  to  demonstrate  the 
modification  of  voice  characteristics.  The  system  was 
designed  so  that  only  the  LPC  parameters  were  needed  to 
make  the  desired  modifications.  No  other  measurements  of 
the  input  speech  are  needed.  Of  the  parameters  calculated 
from  the  input  speech,  only  the  voicing  indicator  remained 
unchanged.  The  LPC  coefficients  are  varied  as  required  by 
the  desired  formant  frequency  and  bandwidth  changes 
require.  The  pitch  period  is  varied  separately  and  the  gain 
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is  adjusted  to  correct  for  changes  caused  by  formant 
bandwidth  modification. 

1.  LPC  Coefficient  Modification 

The  modification  of  the  LPC  coefficients  is 
accomplished  by  three  subroutines:  POLES,  ALT,  and  NEWCF. 
Subroutine  POLES  calculates  the  z-plane  pole  locations  from 
the  LPC  coefficients.  Subroutine  ALT  changes  the  locations 
of  the  poles  according  to  the  various  scale  factors 
specified  by  the  main  program.  The  new  predictor 
coefficients  are  calculated  by  subroutine  NEWCF. 

The  predictor  coefficients,  a.,  are  provided  to 
subroutine  POLES  to  get  the  p order  z-doma i n polynomial 
which  is  factored  into  its  component  roots,  the  z-plane 
poles  of  the  vocal  tract  filter.  This  factorization  is 
done  with  library  routine  ZRPOLY  which  was  sufficiently 
accurate  and  produced  complex  conjugate  pairs  which  were 
exact  complex  conjugates.  This  simplified  the  problem 
which  came  up  later,  of  separating  the  real  poles  and  the 
complex  conjugate  pairs  so  that  the  proper  scaling  factor 
could  be  applied  to  each.  The  input  polynomial  had  all 
real  coefficients  and  therefore  all  the  roots  are  real  of 
in  complex  conjugate  pairs.  These  poles  are  placed  in  a 
complex  array  and  returned  to  the  main  program. 

The  subroutine  ALT  was  provided  with  the  complex 
array  of  pole  locations  and  it  separated  them  into  separate 
arrays  of  real  and  complex  poles.  Each  complex  conjugate 
pole  pair  was  entered  as  one  entry  in  the  complex  pole 
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array.  The  scaling  factors  provided  to  subroutine  ALT 
consisted  of: 


(1)  FSC  - Formant  frequency  scaling  factor 

(2)  BSC  - Formant  bandwidth  scaling  factor 

(3)  RSC  - Real  pole  scaling  factor 

(4)  RLIM  - Real  pole  magnitude  limit 

(5)  SP  - Sampling  period 

The  polar  coordinates  were  determined  for  each  pair 
of  complex  conjugate  poles  and  the  magnitude.  A,  and  angle, 
0,  of  each  were  considered  separately.  The  magnitude  was 
raised  to  the  power  of  the  bandwidth  scale  factor  and  the 
angle  was  multiplied  by  the  frequency  scale  factor. 

BSC 

A ' * A 

9'  =*  9 x FSC 

The  modified  magnitude.  A',  and  angle,  9',  were  used  to 
determine  the  complex  location  and  the  calculated  pole  and 
its  conjugate  were  put  in  the  pole  vector  for  output. 

During  the  alteration  process  each  complex  pair  of  poles 
was  checked  against  a constant  magnitude  of  0.98  to  insure 
that  numerical  instability  or  repeated  impulses  would  not 
cause  excessively  large  outputs. 

Each  real  pole  was  multiplied  by  the  real  pole 
scale  factor  and  checked  to  insure  that  the  magnitude  was 
less  than  the  limit  prescribed.  The  effects  of  varying  the 
real  poles  was  not  studied  and  a real  pole  limit  of  0.95 
proved  to  guarantee  sufficient  damping  of  the  output  to 
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provide  a nearly  zero  mean  output. 

The  poles  from  both  the  real  and  complex  pole 
arrays  were  combined  into  one  array  for  return  to  the  main 
program.  Subroutine  ALT  also  provided  graphical  and 
printed  output  of  the  pole  locations,  before  and  after 
modification  when  this  was  desired.  Figure  20  is  an  example 
of  the  graphical  output  which  shows  the  z-plane  pole 
locations  before  and  after  modification,  in  relation  to  the 
unit  ci rcl e. 


FIGURE  20.  VOCAL  TRACT  POLES 
X INPUT 

-f-  AFTER  MODIFICATION 


Subroutine  NEWCF  performed  the  task  of  multiplying 
the  poles  to  calculate  the  coefficients  of  the  modified 


character i st Ic  equation  for  the  vocal  tract  filter.  This 
operation  was  done  in  double  precision  arithmetic  because 
the  predictor  coefficients  being  calculated  often  differed 
by  only  small  amounts.  This  process  would  require  close 
study  before  this  system  could  be  implemented  on  a short 
word  length  processor. 

2 . Pitch  Period  Modification 

The  pitch  period  was  modified  in  the  main  program 
and  consisted  only  of  converting  the  pitch  period  (an 
integer)  to  floating  point  representat ion,  multiplying  by 
the  pitch  period  scale  factor,  and  reconverting  to  fixed 
point  representation.  Although  changing  the  pitch  period  is 
relatively  simple,  a number  of  other  changes  are  caused  by 
modifying  the  pitch  period.  If  the  pitch  period  is 
shortened  the  gain  must  be  reduced  to  make  up  for  the 
increased  energy  being  input  to  the  vocal  tract  filter. 

The  relationship  between  the  pitch  period  and  the  formant 
bandwidth  also  requires  further  study.  It  appears  that  the 
formant  bandwidths  (Q's  of  the  vocal  tract  resonators) 
should  produce  a impulse  response  which  is  significantly 
attenuated  by  the  time  the  next  impulse  is  input  to  the 
filter.  There  is  most  likely  a feedback  effect  between  the 
vocal  tract  resonators  and  the  vocal  cords  vibration  rate 
which  is  not  considered  by  the  model  used.  This  effect  is 
noted  in  the  graphical  output  as  sharp  discontinuities  at 
the  point  where  each  new  impulse  is  generated. 
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3.  Gain  Adjustment 


Although  overall  gain  of  the  system  can  be  adjusted 
easily  at  the  output,  the  relative  amplitude  from  frame  to 
frame  must  be  retained  during  the  processing.  The  gain 
coefficient,  root  mean  square  of  the  error  function,  is 
adjusted  to  account  for  the  change  in  the  energy  of  the 
vocal  tract  Impulse  response  brought  about  by  the  bandwidth 
changes.  As  was  described  earlier  the  ratio  of  the  original 
and  modified  vocal  tract  filter  gain  a zero  frequency  is 
used  to  estimate  the  ratio  of  inpulse  response  energy. 
Although  this  is  not  strictly  true,  as  long  as  the  scaling 
factors  are  limited  to  those  which  produce  realistic 
speech  sounds,  this  appears  to  work  very  well.  The  zero 
frequency  gain  of  the  original  vocal  tract  filter,  G(in), 
is  calculated  before  the  LPC  coefficients  are  modified. 


G(  in) 


a . 


The  value  of  both  a„  and  a'  is  unity.  After  the 

o o 

coefficients  are  modified  the  same  calculation  is  performed 
aga i n. 


G(out) 


i *0 

The  root  mean  square  of  the  error  signal, 
multiplied  by  the  ratio  to  obtain  the  new 


rms(E),  is 

i n coeff i ci ant. 
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ms '(E)  . 


rms'(E)  ■ rms(E)  x G(in)  / G(out) 

G.  SPEECH  RECONSTRUCTION 

Reconstruction  of  the  sampled  speech  waveform,  from  the 
modified  LPC  parameters  Is  accomplished  by  subroutine 
RECCN.  This  routine  not  only  decodes  both  voiced  and 
unvoiced  speech,  but  also  makes  allowance  for  the 
transition  of  varying  parameters  from  frame  to  frame.  The 
LPC  parameters  from  the  previous  frame  are  saved  between 
calls  to  subroutine  COEFF  and  are  used  during  the  current 
frame  when  needed.  It  is  also  necessary  to  save  output 
values  from  the  previous  frame  to  allow  the  recursive 
calculation  of  the  output  values  at  the  begtning  of  the 
current  frame. 

1.  Unvloced  Speech 

During  continuous  unvoiced  speech  (as  opposed  to 
the  previous  frame  being  votced)  the  new  LPC  parameters  are 
used  immediately  upon  entry  to  subroutine  RECON.  The 
excitation  function  Is  determined  by  calling  a library 
routine  GGNOF  which  returns  normally  distributed  random 
numbers  with  zero  mean  and  a variance  of  unity,  and 
multiplying  the  value  returned  by  the  gain  parameter.  The 
excitation  function  is  changed  for  every  output  sample  to 
simulate  the  continuous  excitation  caused  by  turbulent  air 
In  the  vocal  tract.  The  vocal  tract  filter  is  implemented 
by  the  recursive  addition  of  past  values  of  the  output  to 
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the  excitation  function.  The  z-domain  transfer  function 

llll  1 

e(z)  ■ p 

1 a’z'' 

1-1 

is  implemented  with  the  discrete  time  function 

P 

s(n)  - e(n)  - V-1  a • s(n-i) 

L ' 

i -i 

where  s(n)  is  the  output  sample  and  e(n)  is  the  excitation 
function. 

2 . Voiced  Speech 

During  voiced  speech  a certain  amount  of  continuity 
must  ba  maintained  from  frame  to  frame.  This  was 
accomplished  by  allowing  any  uncompleted  pulses  from  the 
previous  frame  to  finish  before  the  parameters  are  changed. 
Immediately  upon  entering  the  subroutine  during  voiced 
speech  the  pulse  period  counter  is  tested  to  see  if  it  is 
equal  to  the  former  pulse  period.  If  the  former  pulse  is 
not  complete  the  routine  goes  ahead  and  recursively 
calculates  the  output  values.  Upon  completion  of  a pulse 
from  a former  frame  or  any  pulse  during  the  current  frame, 
the  new  LPC  parameters  are  used  to  replace  the  old  one. 
There  was  a direct  replacement  for  all  parameters  except 
the  gain  coefficient.  The  geometric  mean  of  the  old  and  new 
gain  coefficients  is  used  for  the  gain  on  the  current  pulse 
and  the  old  gain  replaced  with  the  gain  just  calculated. 
This  provides  for  the  difference  between  the  old  and  new 


gain  parameters  to  decay  exponentially  but  prevents  sharp 
changes  In  amplitude  from  frame  to  ^rame  and  make  the 
output  speech  more  natural. 

3 . Transition  Frames 

If  the  current  frame  and  the  previous  frame  were 
not  of  the  same  type  care  must  be  taken  to  Insure  that  all 
parameters  are  changed  together.  If  LPC  coefficients  for 
unvoiced  speech  were  used  with  a pulsed  output  an  unnatural 
sound  would  be  likely  to  be  produced.  During  the 
transition  from  unvoiced  to  voiced  speech,  the  retained 
values  from  the  previous  frame  are  normally  small  in 
comparison  to  the  amplitude  of  the  pulsed  excitation 
function.  Therefore  the  voiced  speech  production  may  begin 
immediately.  When  the  opposite  is  true,  the  large 
amplitude  samples  near  the  begining  of  a output  pulse  are 
significantly  larger  than  the  unvoiced  excitation  values. 
Therefore  whenever  unvoiced  speech  follows  a voiced  frame, 
the  previous  output  pulse  is  allowed  to  finish.  The 
damping  that  occurs  during  the  voiced  pulse  normally 
reduces  the  magnitude  of  the  samples  near  the  end  of  the 
pulse  to  the  point  where  they  will  not  interfere  with  the 
unvoiced  speech  to  follow. 

H.  OUTPUT  PROCESSING 

The  reconstruced  speech  samples  are  output  onto  a 
standard  nine  track  IBM  360  magnetic  tape.  These  values 
were  later  input  to  a data  conversion  program  (Appendix 


70 


A. 4)  which  converted  the  floating  point  values  to  integers 
which  were  in  the  proper  format  for  the  XDS  9300  and  within 
an  appropriate  range  for  the  XDS  9300's  digital  to  analog 
converter.  The  necessity  of  using  a seven  track  tape  for 
data  transfer  still  existed,  so  the  significant  bit  of  the 
integers  had  to  be  shifted  into  the  proper  pcsitlon  so  that 
none  of  the  eight  bits  dropped  during  the  writing  of  each 
value  onto  the  seven  track  tape  would  effect  the  data. 

This  tape  was  input  to  the  XDS  9300  which  via  the  digital 
to  analog  converter  made  the  samples  available  on  the 
C0MC0R  5000  in  analog  form. 

These  samples  were  output  at  a rate  of  5000  per  second 
thru  a sample  and  hold  circuit.  Again  two  low  pass  filters 
were  used  to  remove  the  time  quantization  noise  from  the 
samples.  The  analog  waveform  was  recorded  at  3 3/4  i ps  on  a 
standard  tape  recorder  which  could  be  played  at  7 1/2  i ps 
to  hear  the  reconstruced  speech. 

I.  GRAPHICAL  OUTPUT 

The  programs  described  above  were  also  able  to  produce 
a varity  of  graphical  outputs  to  assist  the  researcher  in 
following  the  signals  through  the  LPC  processing.  The 
waveforms  available  from  these  programs  are: 

(1)  Input  speech 

(2)  Error  signal  before  filtering 

(3)  Error  signal  after  filtering 

(4)  Reconstructed  output  speech 


The  z-plane  pole  locations  determine  the  formant 
frequencies  and  bandwidths  and  were  also  available  for 
graphical  display.  A seperate  program  (Appendtx  A. 3)  was 
written  to  display  the  logarithmic  power  spectral  density 
of  the  input  and  output  speech  for  a number  of  consecutive 
frames  and  proved  useful  in  analysis  of  the  output  quality. 


VII.  RESULTS 


The  desired  result  of  this  study  was  the  reconstruction 
of  speech  at  different  pitch  and  formant  frequencies  than 
that  of  the  input  speech.  The  complete  process  of 
encoding,  modification  and  decoding  was  accomplished  for 
three  5-second  segments  of  speech.  Upon  completion  of  the 
process  most  listeners  agreed  that  although  the  input 
speech  was  female,  the  modified  output  speech  sounded 
typically  male.  Although  the  audio  output  was  somewhat 
lacking  in  quality  it  was  intelligible. 

Examples  of  the  printed  and  graphical  computer  output 
are  given  in  Appendix  B.  Two  examples  are  completely 
covered.  The  first  384  msec,  segment  (15  frames)  is  of  the 
vowel  'e'  and  the  second  segment  is  of  the  transition  from 
a fricative  to  a voiced  sound,  'sa',  from  the  begining  of 
the  word  salt.  Both  were  derived  from  a recording  of  a 
female  speaker  were  reconstructed  first  without 
modification  and  then  with  mod i f i caat i ons  which  consisted 
of  reduction  of  the  pitch  frequency  by  a factor  of  0.58  and 
reduction  of  the  formant  frequencies  by  a factor  of  0.88. 
First  the  input  waveform  with  the  logarithmic  power 
spectral  density  plot  of  that  portion  of  the  speech  is 
given.  Examples  of  the  printed  processing  summary  are  next 
and  are  followed  by  the  waveforms  of  the  error  signal  and 
the  filtered  error  signal.  Plots  of  the  vocal  tract  pole 
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locations  are  shown  wi th  the  poles  at  Input  superimposed  on 
the  poles  after  modification.  Finally,  speech  waveforms  for 
both  unmodified  and  modified  output  with  their  respective 
logarithmic  power  spectral  density  functions  are  displayed. 
The  audio  output  is  available  from  the  author  on  request, 
in  the  form  of  an  audio  tape  recording.  This  tape 
recording  is  described  in  detail  in  Appendix  C. 

The  results  above  demonstrate  the  feasibility  of  the 
use  of  linear  predictive  coding  as  a technique  for  voice 
modification.  This  research  also  indicated  areas  in  which 
further  study  and  improvement  may  be  made.  Some  of  these 
areas  are: 

(1)  The  effect  of  noise  during  voiced  speech  on  the 
prediction  error  and  on  the  gain  calculated  from 
the  error.  It  may  be  possible  to  use  only  the 
energy  occuring  at  the  peaks  of  the  error  signal 
and  thereby  attribute  the  remainder  of  the  error 
signal  as  being  due  to  noise. 

(2)  The  effect  of  the  use  of  different  window 
functions  in  au tocor re  1 a t ion  function  calculation 
and  how  this  variation  effects  pitch  period 
determi na t ion  and  the  voicing  threshold. 

(3)  The  possibility  of  constructing  a LPC 
processing  system  with  asyncronous  clocks  for  the 
frame  timer  and  the  output  sample  gereration.  This 
would  produce  a very  similar  effect  to  that 
accomplished  here,  but  probably  at  a reduced  cost. 


VIII.  CONCLUSIONS 


With  the  refinement  and  standardization  of  tPC 
commuication  processors,  the  ratio  of  processing  time  to 
real  time  for  unaltered  communication  is  expected  to  drop 
below  the  current  65%.  The  available  computation  time  may 
be  used  for  the  pitch  and  formant  alteration  described 
above  or  for  other  modification  which  can  be  accomplished 
at  either  the  transmitting  or  receiving  processor  and  still 
allow  real  time  voice  commun i ca t i ons . 

A number  of  possible  applications  of  the  speech 
frequency  cha rac te r i s t i c modification  described  are: 

(1)  A digital  hearing  aid  for  persons  (such  as  the 
author)  with  high  frequency  hearing  loss. 

(2)  Radios  in  military  vehicles  which  would  produce 
speech  in  a frequency  range  different  than  the 
range  of  the  predominant  noise  in  the  vehicle,  i.e. 
low  pitch  voice  in  turbine  aircraft  with  high 
frequency  noise  and  high  pitched  voice  for 
helicopters  and  tanks  where  low  freauency  noise  is 
most  prevalent. 

(3)  Voice  channel  jammers  which  would  produce 
random  phonemes  with  pitch  and  formant 
characteri sties  similar  to  the  current  users  of  the 
channe 1 . 

As  LPC  communications  systems  become  common  because  of 
their  low  data  rate  requirements,  the  use  of  the  LPC 
parameter  modification  will  be  desired  to  extend  the 
flexiblity  of  voice  communication  and  storage  systems. 
Frequency  modification  is  one  viable  process  available. 
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SEVEN  TRACK 
PROGRAM 


1HIS  PAG®  IS  BEST  QUALITY  FRACTICABU 
raoa  OOPY  FUKAIStLED  l'O  UUC  

TO  NINE  TRACK  TAPE  CONVERSION 


01  PENSION 
FACTOR  « 100. 0/( 
REWIND  2 
REWINO  4 
N*1C24 
K*0 
K»K+1 

IFU.GT.13l 
BSU*  a 0.0 
J»0 
JaJ+1 

IF  ( J .LE.50  ) 


IDAT(1024)»CAT(  5 32  481 
2. 0**23) 


STOP 


GO  TO  10 

.5  »END »200 ) IOAT 


>60)  IOAT 


READ ( 2» j 
GO  TO  12 

REAC (2 ,15 »ENDa200 , EPR- 
FJRMAT(128<  8A4)  ) 

CALL  FORM  ( I 0 AT  »N  ) 
jJ»( J— 1 ) *1024 
SUM  a 0.0 
CG  20  I *1 f 1 024 
II  » U-JJ 

OAK  II)  » FLOATII  OATU)  )*FACTOR 
SUM  a SUM  ♦ 0 AT < II  ) 

CONTINUE 
SUM  a SUM/^024. 


S8JTTO: 


J » K 


FORMAT! 40X,’*  RECORD  *, 
* • HAS  BEEN  READ  *•  ) 

IF  (J.LE.l)  WR ITE (6 .30 ) 


• ,13,'  OF  FILE  ’ ,13, 


K, SUM, (DAT! L ),Lal,102^) 


FORMAT!  • FILE  »',I3,’  AVG 
IF  (J.LE.l)  WR  ITE  ( 

FORMAT!  IX  .311  5) 

BSUM  a BSUM  ♦ SUM 
GO  TC  8 

Ja  J-l 

WRITE(6.205  > K,J 

FORMAT ( * END  CF  FILE  ’,13.16, 

* • RECORDS  HAVE  BEEN  READ’) 

8SUM  » BSUM/ FLOAT!  J ) 

DO  85  J*l, 51200 

DAT ( J ) « DAT!  J ) — B SUM 

CONTINUE 

NR  I TE(  4 ,9  8 ) ( DAT!  L)  ,L»1  ,51200  ) 
FORMAT! 128A4) 

ENCFILE  4 

WRITE (6, 30)  K, BSUM,  (DAT (L) ♦ L»1 ,1024) 

GO  TO  5 

t»RI  TE(  6 ,6  5 ) K 

FORMAT!  ’ **  ERR  FILE',13,’  ** ' ) 

GO  TO  17 
ENG 


• ,E16 .7 //( IX ,8E14.7)  ) 
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IBIS  PASE  IS  BEST  QUALITY  FRACTICABU 
FRO*  OOPY  FURBISHED  TO  DDC  


APPENDIX  A. 2 


LINEAR  PREDICTIVE  COOING  AND  VOICE 
MODIFICATION  PROGRAM 


LINEAR  PREDICTIVE  CODING  AND  SPEECH  MODIFICATION 
PROGRAM 

SAMPLEC  SPEECH  IS  INPUT  VIA  FILE  FTO2FC0I  (TAPE  OR 
DISK)  IN  FORMAT  128A4  FOR  EFFICIENT  STORAGE 

SPEECH  IS  ENCODED  INTO  LPC  CONSISTING  OF  PITCH  PERIOD 
(IPP),  VOICEO/UNVOICED  OECISION  (IVF)»  GAIN  FACTOR 
(RMSET,  AND  LPC  COEFFICIENTS  (A(I>) 

MODIFICATIONS  TO  CHANGE  POLE  POSITIONS  MAY  BE  SPECIFIED 
SAMPLED  SPEECH  IS  RECONSTRUCTED  ANO  OUTPUT  ONTO  FILE 
FT 03F001t  ALSO  IN  128A4  FORMAT 

PROGRAMMED  BY  G.  T.HALLt  1578 

DIMENSION  X(256)tA(14)tXX(14).E(256  )»X0(256) 
DIMENSION  EF ( 256)  t E S(  5 ) tEFS ( 5 ) , ZERO (256) 

COMPLEX  P (14  I 

OATA  XX»EFS*ES»ZERO/28  0*0  *0/ 

SET  V 0 ICE/ UN  VO  ICE  THRESHOLD 


THRESH  = 
IW  IN  = 1 


2.05 


SET  ORCER  OF  PREDICTOR 


PLOTTER  OUTPUT 

1»  INPUT  2=ERR0R  SIGNAL 

3=F  1LTERED  ERROR  4=0UTPUR 

5*P0LE  LOCATION  (FIRST  NPLPLT  FRAMES) 


IXPLT  » 5 
NPLPLT  » 10 

SET  IHRXXal  FOR  PRINTEC  RESULTS  FROM  SUB 

IWR  a 1 
IWRERR  a 0 
IWRAUT  * 0 
IWRALT  » l 
IWRPP  a 0 
IWRPOL  * o 
IWRNC  * 1 

.SET  MODIFICATIONS  DESIRED 

(FSC ) FREQUENCY  SCALE  COEFF 
(BSC)  BANDWIDTH  SCALE  COEFF 
(PSC)  PITCH  PERIOD  SCALE  COEFF 
( RSC ) REAL  POLE  SCALE  COEFF 
(RLIM)  REAL  POLE  MAGNITUDE  LIMIT 
(SP)  SAMPLING  INTERVAL 

FSC  » 0.38 
BSC  » o.63 
PSC  * 1.73 
RSC  a 1.0 
RLIM  a 0.95 
SP  a 0.0001 

SET  NUMBER  OF  SAMPLES  PER  FRAME 


c 

c 

c 

c 


tBIS  PAOT  IS  BEST  QUALITY  FRACTICABL1 

roott  oofy  furrisred  ro  dx»c 


15 

C 

c 

c 


20 

£ 

C 


21 

C 

C 

c 


22 


23 

C 

c 

c 

c 

c 

c 


25 

c 

c 

c 


30 

40 

8 

C 

41 

42 
C 
C 
C 


SET  NUMBER  OF  FRAMES  (NFRAME)  AND  NUMBER  OF 
FRAMES  SKIPPED  BEFORE  FIRST  ANALYZED 

NFRAME  « 15 
ISKIP  * 28 

IF  (ISKIP.LE.OI  GO  TO  2 
03  1 L * ItlSKIP 

READ  ( 2*  1 By  END  « 999)  (X(J),J*l,N) 

CONTINUE 

IF(IXPLT.LT.5.AND.IXPLT.GT.O)  CALL  VPLTIN(N) 

DO  200  I * 1. NFRAME 

READ  (2.15.EN0  * 9991  (X(J),J*1,N) 

F0RMATU2  8A4) 

IF  ( IXPLT  . EQ.l  ) CALL  VPLT(X) 

DETERMINE  RMS  VALUE  OF  SPEECH  SAMPLES 

CALL  RMS  (XtNyRMSX) 

IF  (IWR.EQ.l)  WRITE  (6,20)  I,RMSX 

FORMAT (• 1 FRAME  • » 14 //IX,  ' RM  S VALUE  OF  SAMPLES  * ', 

* F18.8) 

DETERMINE  PREDICTOR  COEFF  BY  AUTOCORRELATION  METHOD 

CALL  AUTO  (X,N.A,  IP,  IWIN,  IWRAUT  ) 

IF(IWR.EO.l)  WRITE(6,21)  ( ( J,  A ( J)  ) , J=1 , 1 P ) 

FORMAT!  /IX,  'PREDICTOR  C OEFF  I C I ENT  S ' / ( 1 OX,  13,  IX,  FI  3. 8)) 

DETERMINE  ZERO  pREQ  GAIN  CF  VOCAL  TRACT  TRANS  FCN 

GIN  = 1 .0 
DO  22  J * i,.IP 
GIN  * GIN  + A(J) 

CONTINUE 
GIN  * 1.0/GIN 

IF  (IWR.EQ.l)  WRITE(fc,23)  GIN 


FORMAT ( /•  GIN*' ,F10.5  ) 

DETERMINE  POLES  OF  CHARACTERISTIC  EQUATION 
CALL  POLES  (A,IP,P,lWRPOL,ICK) 

INVERSE  FILTER  SAMPLES  TO  GET  ERROR  SIGNAL 


CALL  ERR  (X, N,  A,  IP , E,XX  ) 

IF  (IXPLT. EQ.2)  CALL  VPLT(=) 
IF  ( IWRERR.EQ  .1)  WRITE  (6,25) 
F0RMAT(1X,10F12.4) 

DETERMINE  RMS  VALUE  OF  ERROR 


(E(J),J  * 1 , N ) 


CALL  RMS  ( E tJ*  ,RMSE  ) 

IF  (IWR.EQ.l)  WRITE  (6,30)  RM SE 
FORM  AT  ( /I  X,  ' RMS  VALUE  OF  ERROR 
RATIO  » RMSX/RMSE 


* • » F18  .8  ) 


IF  (IWR.EQ.l)  WRITE  (6,40)  RATIO 

FQRMAT(/1X, 'RATIO  SAMPLE  RMS  TC  EPRCR  RMS  = *,F18.8) 


TEST  IF  VOICED  CR  UNVOICED 
IV  F » 0 


IF  (RATIO. GE. THRESH) 

" uii 


, __  IVF  » l 

IF  ( IVF  .EQ.l)  WRI  TE  ( 6,41) 

FORMAT!/'  THIS  FRAME  IS  VOICED'/) 

IF  (IVF.EQ.Q)  WRITE  (6.42) 

FORMAT  ( /•  THIS  FRAME  1$  UNVOICED'/) 


IF  UNVOICED  BYPASS  PITCH  DETECTION 

•5 


IF  (IVF.EQ.O)  GO  TO  45 
CALL  PITCH  (N,E,EF,ES, 


EFS ,1 PP ,1 WRPP) 


78 


THIS  PAGE  IS  BfiST  QUALITY  PRACTICABLE 
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C 

c 

c 

45 

46 
C 

c 

c 

49 

E 

C 


51 

E 

C 


5J 

C 

C 

C 


52 


53 

C 

C 

C 


54 


200 

999 


IF  (IXPLT.EQ.3I  CALL  VPLT(EF) 

GO  TO  49 

IF  UNVOICED  ZERO  SAVED  POST  FILTER  ERROR 


1 ll5 

0.0 


CO  46  J a 
EFS(J)  - 
CONT  INUE 
IF  (IXPLT.EQ.3)  CALL  VPLT(ZERO) 


DETERMINE  NEW  PITCH  PERIOD 

IPPN  » IFIX(FL0AT(  IPP)*PSC*0.  5) 

ALTER  POLE  LOCATIONS 

IF (I.EQ.l .AND.IXPLT.EQ.5)  CALL  PLOTS(IA,IB 
IF ( I.EQ.NPLPLT.AND. IXPLT.EQ.5 ) IXPLT*0 
CALL  ALT 2 (P»F$C»8SC,RSC»RLIM,SP» IP, I WRAL T , 
WRITE(6,51)  IFPN 

FORMAT(/'  PITCH  PERIOD  AFTER  MODIFICATION' 
CALCULATE  NEW  PREDICTOR  COEFFICIENTS 


CALL  NEWCF(IP,PvA,IwRNC) 

00  5C  J * 1,IP 
JJ  » J ♦N—  IP 
XX( J)  » X< JJ) 

CONTINUE 

DETERMINE  ZERO  pREQ  GAIN  CF  VOCAL  TRACT  TRANS 

GOUT  « l.Q 
00  52  J » i„ip 
GOUT  a GOUT +A  (J  I 
CONTINUE 
GOUT  » 1 . 0/GO LT 
IF  ( IWR.EC.l  ) WRITE  (6,  53) 

FORMAT ( /•  G OLT  *•  tF10.5) 


GOUT 


ADJUST  OUTPUT  GAIN 


RMS  E « RMS  E*G  IN/GOU T 

CALL  RECON(  A,  IP,RMSE,IVF,IPPN,N,XC) 

IF  (IWR.EQ.1)  WRITE  (6,54)  (XO(L)tL  » 1,N) 
FORMAT!/'  OUTPUT  S AMPLES’ / ( IX, 1GF13 .5)  ) 

IF  (IXPLT.EQ.4)  CALL  VPLT(XO) 

WR  ITE (3, 15 ) ( X0( J)  , J=1,N) 

CONTINUE 

IPEN  » 999 

CALL  PLOT (A, 8,  IPEN  ) 

STOP 

END 


t IC  ) 

IXPLTJ 

,13) 


FCN 
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SUBROUTINE  AUTO  < S , N, A* IP, IWIN, IWR  ) 

determine  linear  prediction  coefficients  for  a set  of 

INPUT  SAMPLES  USING  THE  AUTOCORRELATION  METHOD 

S = VECTOR  OF  INPUT  SAMPLES 

N = NUMBER  OF  SAMPLES 

A - VECTOR  OF  PREDICTOR  COEFFICIENTS 

IP  * NUMBER  OF  PREDICT  CR  CCEFF  ( ORDER  OF  MODEL  ) 

IP.LT.17 

IWIN  » TYPE  OF  WINOOW  < SEE  SUBR  WINDOW  ) 

IWR  * 0 NO  PRINTING  OF  PREDICTION  COEFFICIENTS 

REF:  MAKHOUL:  LINEAR  PREDICTION 
PROC  IEEE  » APR  75 

DIMENSION  S(llfT(512l«R(16)«A(l) 

CALL  WINDW  ( S » T, N,  I WIN1 

CALCULATE  AUTOCORRECATI cn 

RO  a 0.0 

DO  10  1*1, N 

RO  * RG  + T ( I )**2 

CONTINUE 

DO  30  J*1,IP 

SUM  a 0.0 

NN  * N-J 

DO  20  1=1, NN 

SUM  = SUM+T ( I ) *T ( I +J ) 

CONTINUE 
R ( J ) = SUM 
CONTINUE 

IF(  IwR.EQ  .1 Y WR ITE (6,31)  RO, ( R(LI  ,L*1, IP) 

FORMAT < /I X, rAUTOCOR EL  V ALS ‘ , F16.5/1X, 8F16.5/1 >, EF16.5) 

SOLVE  MATRIX  EQN  FOR  A VECTOR 

CALL  COEFF(RO,R,IP,A,IWR) 

TAKE  KEGITIVE  OF  PREDICTOR  COEFF  TO  GET 
COEFF  OF  CHARACTERISTIC  EQN  OF  FILTER 


DO  6C  I = 1,1! 
A(  I ) a -A(  I S 


CONTINUE 

IF  UWR.NE.O)  WRI  TE  ( 6 , 70)  ( U ,A  (I)  ) , 1=1  , 1 P ) 

FORMAT  </iX,  'PREDICTOR  COEFF  IC  IEN  TS  * / ( 10X  , I 3 , IX  ,F1 3.  8 )) 

RETURN 

END 
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c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

8 

c 

c 

c 

c 

c 

c 

c 

c 

c 


c 

c 

c 


SUBROUTINE  COEFF ( RO ♦ R ,N, A, I WR) 
SOLVES  THE  MATRIX  EQUATION  RR  A = R 


RR 

RR 


AUTOCORRELATION  MATRIX 
RIO)  R (>1  ) R c; 

Rll)  RIO)  R _ 

R (2)  R ( 1 ) RIO) RIN-3) 


(2  ) R (M-l  ) 

(1) RIN-2) 


R 

R 


R(N-l)  RIN-2)  RIN-3) RIO) 

AUTOCORRELATION  VECTOR 
RI1) 

R 1 2 ) 

R (3 ) 

PIN) 

A » VECTOR  OF  PREDICTOR  COEFF 
A » AH) 

A I 2) 

AO) 

AIN) 

METHOO  ATTRIBUTED  TO  DURBIN  AS  DESCRIBED  IN 
•LINEAR  PREDICTION*  BY  VAKHOUL,PROC  IEEE  APR  75 
P.  566 

DIMENSION  AKI20), A0I20) ,AI20),R(2C) 

FIRST  ITERATION 
EO  » RO 

AKll)  = RUI/EO 
All)  » AKll) 

E * ll.O-AKM  )**2)*EO 
eo  a £ 

AO  II  ) a All) 

FOLLOWING  ITERATIONS 

2,N 


00  100  I 
IM1  * 1-1 
SUM  « 0.0 

DO  20  J a 1.IM1 
IMJ  a I-J 

SUM  « SUM+  R(IMJ)*AG(J) 

20  CONTINUE 

AKll)  * l R<  1 1— SUM ) / £C 
AID  > AK  ( I ) 

CO  30  J a 1,  I, Ml 
IMJ  » I-J 

A I J ) a A0( J)-AK( I)*AO(IMJ) 

30  CONTINUE 

E » tl.O-AKCI  ) **2  J *50 
EO  a E 

00  50  J a 1,1 
AO  < J)  a A I J J 
50  CONTINUE 

100  CONTINUE 
C 

C PRINT  6 IREMAINING  ERROR  DUE  TO  LIMITING 
C OROER  CF  APPROXIMATION)  AND  A CHECK  CF  SOLUTION 
C IF  OESIRED 
C 

IF(IWR.EQ.I)  WRITE  (6*101)  E 

101  FORMAT ( * SUB  COEFF  E=  'fF18.8l 
IFdWR.EO.il  CALL  T EST  ( A»  RO  , R , N ) 

RETURN 

END 
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SUBROUTINE  TEST  (A,RO,R,IP) 

C 

C MULTIPLIES  PREDICTOR  CCEFF  VECTOR 
C A BY  THE  AUTOCORRELATION  MATRIX  RR  AND  CHECKS 

C the  value  against  the  autocorrelation  vector 
c to  insure  accurate  solution. 
c 

DIMENSION  A(IF)fR(IP) 

DO  IC  I = 1,IP 
SUM  = 0.0 

DO  9 J * 1 TIP 
L = IABS(I-J) 

IF ( L . EC. 0 ) SUM  = SUM+A ( J ) *R0 
IF(L.NE.O)  SUM  = SUM+A ( J ) * R ( L ) 

9 CONTINUE 
WRITE(6,15I  I , R ( I ) , SUM 

15  FORMAT!  ' R ( • » 12 , 1 ) * ',2E1A.A,'  = SIMM 

10  CONTINUE 
RETURN 
END 


C 

C 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 

c 


10 


20 


25 

30 


SUBROUTINE  POLES  ( A , I P,  P,  I W R,  I CK  ) 

CALCULATES  POLES  OF  CHARACTERISTIC  ECN  FROM 
PREDICTOR  COEFFICIENTS  ANO  IF  WANTED  PRINTS 
OR  PLOTS  THOSE  POLES 

A = VECTOR  OF  PREDICTOR  COEFFICIENTS 
IP  = NUMBER  CF  CCEFF  ANC  POLES 

COEFF  AO  IS  ASSUMED  TO  BE  1.0 
P = COMPLEX  VECTOR  OF  POLE  LOCATIONS 
IWR  = 0 NO  PRINTING  OF  POLES 
ICK  * 0 ALL  POLES  INSIDE  UNIT  CIRClE 
= 1 POLE  OUTSIDE  UNIT  CIRCLE 

DIMENSION  A(l),0(21) tX( 20) ,Y( 20) ,NAME(20) 

COMPLEX  P (1) 

8(1  ) = 1.0 
CO  10  1*1,  IP 
II  = 1+1 
3(11)  = A ( I > 

CONT  INUE 
II  P * I P+1 

CALL  ZRPOLY(B,IP,P,IER) 

IF(IWR.NE.O)  WR  ITE  ( 6.  20  ) t ( l , P ( I ) ) , 1=  1, 1 P ) 

FORMAT! //10X,' POLES  CF  CHAR  E CN • / ( lOX , 13 , IX , 2 El A .7 ) ) 
ICK  = 0 

DO  30  I * 1 , I F 
IF ( CABS! P ( I) ) .LE.1.0)  GO  TO 
ICK  = 1 

IF  (IWR.NE.O)  WRITE(6,25) 

PDRMAT(20X,'°GLE  NUMBER  • 

* ' ABOVE  IS  OUTSIDE  UNIT  CIRCLE') 

CONTINUE 
RETURN 
END 


30 


I 

,13, 
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SUBROUTINE 


(S ,N*A,IPtE,SXJ 


DETERMINE  AN  ERRCR  VECTOR  CF  DIFFERENCE 
BETWEEN  ACTUAL  SAMPLE  VALUES  AND  THE 
VALUES  PREDICTED  FROM  PAST  SAMPLES. 

S * VECTOR  OF  SAMPLES 

N * NUMCER  OF  SAMPLES 

A = VECTOR  OF  'PREDICTOR  COEFF 

IP  * NUMBER  OF  PREDICTOR  COEFF 

E » VECTCR  OF  €RPOR  VALUES 

SX  * EXTRA  SAMPLES  ( IP  OF  THEM  » 

SAVEO  FROM  LAST  FRAME 


THE  ERROR  IN  THE  DIFFERENCE  BETWEEN 
CURRENT  SAMPLE  ANO  THE  WEIGHTED  SUM 
THE  LAST  IP  SAMPLES. 


DIMENSION  SCI  J t A(  1 » , E < 1 ) ♦ T ( 542  »*  SX  ( 1 ) 
DO  10  I«1,IP 
T(  I ) = SX  (I  ) 

CONTINUE 
DO  20  I *1 » N 
T ( I ♦! P ) = SCI) 

CONTINUE 
03  AC  I»1,N 
SUM  * 0.0 

DO  30  J»1  t IP 

II  = I+J-l 

OJ  * IP-J+1 

SUM  * SUM+T(  II ) *A(JJ) 

CONTINUE 

E(  I ) = S 1 1 ) + SUM 

CONTINUE 

RETURN 

END 


rais  PAGE  IS  BEST  QUALITY  PRACMCAfiLI 

racw  oor*  ruKaisRsu  ro  uuc 


SU8RQUT  IN  £ P ITCH(N,E,EF,ES,EFS,IPP,IWR) 

DETERMINES  PITCH  PERIOO  (IN  NUMBER  OF  SAMPLES) 
FROM  TFE  ERROR  SIGNAL  OF  INVERSE  FILTERED  SPEECH 

N * NUMBER  OF  SAMPLES 

E * ERROR  VECTOR 

EF  * FILTEREO  ERRCR  VECTOR  (OUTPUT) 

ES  * FIVE  SAVED  ERROR  SAMPLES 

EF  S * FIVE  SAVED  FILTERED  ERROR  SAMPLES 

IPP  * PITCH  PERIOD  (OUTPUT) 

IWR  * 1 FOR  PRINTING  DURING  SUBROUTINE 


DIMENSION  ES  (5)  ,EFS  (5  ),  E(  1)  , EF  ( 1 I , R ( 256  ) 

DIMENSION  XI ( 261) , XO(261) 

FORM  FILTERING  VECTORING) 

00  10  1*1  ,5 
XI  ( I ) *ES(  I ) 

X0(  I )*EFS ( I ) 

CONTINUE 
I TE  MP  *N+  5 
DO  15  1*6, ITEMP 
1 1 *1-5 

xn  i )=e(  1 1 ) 

CJNT  INUE 

00  20  1*6, ITEMP 

BLTTERViORTH  DIGITAL  FILTER  CUTOFF  AT  800  HZ 

XO  (I)  * 0. 4474512  39  E-3*Xl<  I ) ♦ C.  222  7 2 562E-  2*X  I ( 1-1 ) 

* *-0.  44  745124E-2-XK  1-2  ) K>  . 44 745123 9E-2  *XI  ( l -3) 

* +0. 22272 562E-2* XI (I-4H-0. 4474512 39E-3 *X I < 1-5 ) 

* <-3  .41 07723 1*X0  ( I- 1 )- 4 .732  3C  6 57*XO  ( I - 2 ) 

* ♦3.42533523*X0(I-3)-1.24929545*X0( 1-4) 

* <-0.  185257941*X3<  1-5) 


EF  ( 1-5)  *X 

0(1) 

20 

CONT  INUE 

00  30  1*1 

,5 

ES ( I ) *E ( I 

♦N— 5 ) 

EFS  m*  EF 

( I *N— 5 ) 

30 

CONTINUE 

IW  IN*4 

CALL  WINOW(£F,XO,N,  IWIN) 

C FECK  FOR  PEAKS  1.2  TO  13.  C MSEC 
ITEMPC-N-56 

IF(JWR.EO.l)  WRITE(6,33)  ( ( EF  ( L I , X0(  L ) ) ,L  *1  , N ) 
F0RMAT(1X,10F13.5) 

00  50  1*1,1  TEMPO 

SUM*0.0 

ITEfPA*N-I 

00  40  J*1 , ITEMP A 
SUM*SUM-fXO(j  )*XO(J  I ) 

CONTINUE 
R(  I )*  SUM 

lf(  IWR.  E0.1J  WRITE  (6,41  I I.R(I) 

FORMATC  FILTERED  ERROR  AUTOCORRELATION  FOR'  ,14, 
* F18.8) 

IF(t.LT.33)  GC  TO  50 
IHITlte?.  .LT  .0.0  ) GO  TO  50 
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ITENPB3  1-34 

CO  45  J*ITEMPB,I 
IF(R(ITEST).LT.R(J))  GO  TO 
CONTINUE 
IPP* ITEST 
WR  I Tc(  6,46  ) IPP 

FORMAT  (•  PITCH  PERIOO  IS*,I4) 

RETURN 

CONTINUE 

IPP»100 

WRITE (6 ,55) 

FORMAT! ///•  SLB  PITCH  FAILED  T 
* /•  PITCH,  PITCH  PERIOO  StT  EQi 
RETURN 
END 


IS  BEST  QUALITY  PRACTICABLE 
TOG*  DOPY  FURBISHED  TO  DDC 


SLB  PITCH  FAILED  TO  DETERMINE  CCR 
ITCH  PERIOO  StT  EQUAL  TO  100'///) 


CORRECT* 


SUBROUTINE  ALT2  ( P , F$C  , BSC, RS C, RL  IM, SP, IP, IWR  , IXPLT) 

GIVEN  IP  COMPLEX  POLES  OF  THE  VOCAL  TRACT 
TRANSFER  FUNCTION,  CALCULATES  THE  FCRMANT 
FREQUENCIES  AND  BANDW  IDTHS  ANO  SCALES  THEM 
AS  DESIRED.  PRINTED  OUTPUT  IS  AVAILABLE. 

P * VECTOR  OF  IP  COMPLEX  POLES 
FSC  » FREQUENCY  SCALE  FACTOR  OUT/IN 
RSC  » REAL  POLE  SCALE  FACTOR 
RLIM  * REAL  POLE  MAGNITUDE  LIMIT 
BSC  * SANDHI OTH  SCALE  FACTOR  OUT /IN 

IP  * NUMBER  OF  POLES  _ ? 

SP  » SAMPLE  PERIOD  IN  SECONDS  5 

I HR  * C NO  OUTPUT  PRINTED  iTOddC 

1 PRINTED  RESULTS  rUK»v^ 1 

IXPLT  « 5 FOR  PLOT  OF  POLES 


IXPLT  » 


PLOT 


POLES 


•HU  TAG* lS  *©>  *° 


DIMENSION  FORF  ( 14)  , EVM  14) 

COMPLEX  P(11,CPP(l4)  ,CRP<14) ,CTEM 
DIMENSION  XP(6),YP(6I»II PEN (61 
DATA  XP/3  .0,2.75,-2.75,0.0,0.0,2.5/ 
DATA  YP/10.  0,0.  0,0. 0,2. 75,-2.  75,0.0/ 
DATA  lIPEN/-3,3,  2,  3,2,3/ 

ZERC-O. 0 

IF ( IXPLT. NE. 5)  GO  TO  9 
NPEN  * 3 

CALL  NEWPEN(NPEN) 

DO  2 1-1,6 

CALL  PLOT ( XPI i ) , YP ( I ) , 1 1 PEN ( I ) ) 
CONTINUE 
I PEN  * 2 


00  4 1-1,  241 
TEM  » 0.0261 8 *FLOAT (I J 
XX  * 2.5  * CO  S( TEM I 
YY  * 2.5  * SIN(TEM) 
CALL  PLOT (XX, YY, I PEN) 
CONTINUE 


IPEN  * 3 

CALL  PLOT  (ZERO, ZERO, IPEN) 

HI  EG  * 0.  25 
ANG  = 0.0 
NC  * -1 
ITEXT  * 4 
NPEN  = 4 

CALL  NEWPEN(NPEN) 


CO  6 I«1,IP 

XX  * 2.5  * REALIPU  ) ) 

YY  » 2.5  * A IMAG(  P<  I ) ) 

CALL  SYM6CL(XX,YY,HIEG,  IT  EXT  , AN  G,  NC  I 
CONTINUE 


TEST  EACH  POLE  ANO  PLACE  IN  PROPER  ARRAY 


DO  40  I 
IF  U I MAI 


Wi 


IF(  A I MAGI  P( 
IF  ( ICP.EQ.O 
DC  10  J» 
IF ( CAB  S ( 


) ).EQ.O.O)  GO  TO  30 
GO  TO  20 
, I CP 


10 

CONT  INUE 

20 

ICPMCP+1 

CPP(  ICP)-P(I) 

GO  TO  40 

30 

IRP*IRP«-1 
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40 

c 

c 

c 


CRP ( IRP  ) = PM) 
C3NTI  NUE 


50 


55 

60 


70 


80 

85 

90 


5 

c 


100 
c 
c 
c 


It 

i 


CALCULATE  FORMANT  FREQ  AN  0 8AN0WIJTH  FOR  EACH 

00  50  1=1, ICP 
A*C ABS  (CPP(l)  ) 

BW  (I)=(0.  O-ALOG(A)  )/(  6.2831  352*SP  ) 

TH»ATAN2  ( AIMAGICPP  ( I ) >,  REAL  (CPP(  I ) ) ) 

TH»A8S(TH) 

C3R F ( I )»  TH/( SP*6. 2831 852) 

CONT  INUE 
ICPMlaICP-1 
00  60  I»1,ICPM1 
IP1»H-1 

DO  55  J*IP1,ICP 

IF(FQRF(I)  .LT.FORF(J))  GO  TO  55 
TE  M=BW  (I  ) 

BW(I ) *BW<  J) 

BW  (J  ) = TEM 
TEM*FOPF(I ) 

FORF ( I ) = F0  RF ( J ) 

FQRF( J ) =T  £M 
CTEM=C  PP ( I ) 

CPP(  I)  *CPP(J) 

CPP(J) =CTEM 
CONTINUE 
CONTINUE 

IF  ( IWR . EQ  .1)  WRIT£(6,70)  ( ( I , CP  ° ( I ) , FOR  F ( I) , B MI)  ) , 

* 1 = 1, ICP) 

FORMAT  ( * FORM  ANT' » I3»  ' DUE  TO  POLES  AT  Z=',F8.*t, 

* J*' ,F8.4, • FORMANT  FR£Q= • , F3 .1 , ' 9ANDW  I DT  (-*  * , F 8 . 1 ) 
IF  ( IRP.SQ.OI  GO  TO  35 

IF(iWR.EO.l)  WRITE(6,30)  < ( I,  CRP  II)  ) , I * 1,  IR  P ) 

FORMAT!*  REAL  POLE  NUMBER  ',13,'  AT  Z = *,2F8.4> 
IF(lWR.EQ.l)  WR ITE ( 6 , 90 ) F SC . BSC  , RSC  ,RLI  M , SP 
FORMAT (/ • FORMANT  FREQUENCY  SCALE  FACTOR  = ',! 

* • 3 A NO WIDTH  SCALE  FACTOR  =•  ,F8.4/ 

* • REAL  POLE  SCALE  FACTOR  =',F8.4, 

* • REAL  PCLE  MAGNITUDE  LIMIT  = ',F8.4, 

* • SAMPLE  PERIOD  =',F9.6//‘  AFTER  MODIFICATION*) 

ALTER  FORMANT  FREQUENCIES  ANO  BANDWICTFS 

CO  100  1=1, 'ICP 
A=C  ABSICPP  (I  > ) **B  SC 
IFf A .GT .0.98  ) A=  0 . 9 9 

TH  = AT  AN2  ( AI  NASICPPdtl,  PEAL  1C  Pe  ( I ) ) ) *FS  C 
TH=ABS(  TH) 

CPP  <i)  = A*CMPLX(COS  (TH),  SIN(TH)  1 
BW(  I)  = ( O.0-ALGG(  A)  ) / (6 . 28  31 9 5 2*S  P > 

FOR F(  I )«TH/(  6.2831852*SP) 

CONTINUE 

ALTER  REAL  POLE  LOCATIONS 

IF  (IRP.EQ.Q)  GO  TO  115 
DO  110  I-l,  IR P 
CRP  ( I l=CRP( I)*RSC 
TEM=CABS( CRP ( I ) ) 

IF(TEM.GT  .RLIM)  CRP(I)=CRP(I)*RLIM/TEM 
J CONTINUE 

IF(  IWR  *E  3 ,1  ) WRI  TE(  6,70  ) ( ( I , CPP  ( I ) ,FORF  (I  ) , BM  I)  ) , 


F9.4, 


* I«1,ICP) 

1=  ( IRP.cQ.O)  GO  TO 
IF  ( IWR.EQ.l  i WR  ITE(6 


10  ) ( ( I,  CRP  ( I ) ) , I = 1,  IRO) 
RECONSTRUCT  ARRAY  OF  POLES 


C 

118  INO  »0 

DO  120  I=lfICP 
IND  = I N0*1 
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i 


30 

35 

40 


P(  INOI-CPP(I) 

INO-INO+1 
P( IND)*CONJG(CPP( I) ) 

120  CONTINUE 

IF  (IRP.EQ.O)  GO  TO  135 
00  130  I * 1, 1 RP 
IN  C*  IND+1 
P(INO)*CRP< I) 

CONTINUE 

IF ( I WR, E Q. 1 ) WRIT  E ( 6 , 140 ) I NO 
RDRMAT(10X,' RECON  POLES', 14) 

IF  (IXPLT.NE.  5)  RETURN 
C 

ITEXT  - 3 

00  150  1=1,  IP 

XX  » 2.5  * RE AL( P ( I ) ) 

YV  * 2.5  * A I MAG( P ( I ) ) 

CALL  SYM8  0L(XX,YY,HEIG,  ITEXT,  AN G,NC  ) 
150  CONTINUE 
IPEN  = -3 
XX  * 5.0 
YY  * -10.  C 

CALL  PLOT  (XX, YY,  IPEN  1 

RETURN 

ENO 
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FRO*  OOFY  FURBISHED  TO  UDC 


88 


nonnonnnnnnoonnonn 
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SUBROUTINE  NEWCF  (IP»P«A,  IWRI 

DETERMINES  THE  COEFFICIENTS  OF  THE 
PREDICTCR  POLYNOMIAL  FROM  THE  ROOT  OF 
THE  CHARACTERISTIC  EQUATION 

IP  » OROER  OF  THE  POLYNOMIAL 

P * COMPLEX  ROOTS  OF  CHARACTERISTIC  ECN 
I.e.  POLES  OF  THE  FILTER 

A * ARRAY  OF  REAL  COEFFICIENTS 

IWR  * 1 FOR  PRINTING  DURING  SUBROUTINE 

IF  ALL  COMPLEX  ROOTS  ARE  IN  CONJUGATE  PAIRS 
ALL  OF  THE  COEFFICIENTS  SHCULD  BE  REAL 
THIS  CAN  BE  CHECKED  WITH  OUTPUT 


CONFLEX*l6  PP(14),AA(1A) 

C3MPLEX*8  P(IP) 

REAL*4  A( IP  I 
Z * 0.0 

DO  10  I * l.IP 
AA  Cl)  * P (I) 

ppm  = p<n 

CONTINUE 
K = IP 
M * I P— 1 
DO  40  L =*  1»M 
00  30  I » 2 j K 
AA(  I ) = AA(I)+AA(I-1) 

CONTINUE 
K * K-i 

DO  20  I * 1VK 
J = H-L 

AAU)  = pe< J)*AAd1 
CONTINUE 
CONTINUE 
K = IP/2 
K = 2*K/(  IP-K  ) 

DO  50  I a K* I F»2 
AA ( I ) a - AA ( I ) 

CONTINUE 

DO  60  I a i , i p 

J a IP*1-I 
A ( J ) =REAL(AA  (I) ) 

PPIJI  = AA(IJ 
CONTINUE 

IFdWR.NE.il  RETURN 

WRI  TE  (6  » 7 0)  (U.PPdll.I  = 1 , I PI 

FORMAT ( / ' RECONSTRUCTED  POLYNOMIAL  COEFFIC IEN  TS' / 

* 20X»  • IMAGINARY  TERMS  SHOULD  BE  ZERO 1 / 

♦ ( 1X,I5,F18.3,E18.4)) 

RETURN 

ENO 


c 

c 

c 


8 

c 

c 

c 

c 

c 


10 


c 

c 

i 


SUBROUTINE  RECON (AfIPfR MS tIVF ,IPP,IS,SI 

RECONSTRUCTS  SPEECH  SAMPLES  FROM  LPC  COEFP,  ETC 

A « SECTOR  Of  LPC  COEFF 

IP  » NUMBER  OF  COEFP  (ORDER  OF  FILTER) 

RMS  » RMS  VALUE  OF  ERROR  SIGNAL 
IVF  » 0 UNVOICED 

» 1 VOICED 

IPP  * PITCH  PER  100  IN  NUMBER  OF  SAMPLES 
N » SAMPLES  PEP  FRAME 
S » SAMPLE  VECTOR  (OUTPUT) 

DIMENSION  A(1  )fS(l)  fX(270>  ,XX(14)  ,AC(14) 

DATA  XXtR  MSOt I SEED  » I VF 0/1 5*0. 0f123A,0/ 

00  10  I * 1 » I F 

x(  iT  » xx (i) 

CONTINUE 
NIP  « N+I  P 
NS  » 1*1? 

IF  CURRENT  PULSE  UNFINISHED  OON'T  CHANGE  COEFF  YET 
I F ( IVFO.NE.O  ) GO  TO  400 
UP  CATE  COEFF 


00  RMSO  » SQRT(RNSO*RMS) 
IF(IVF.EQ.O)  RMSO  = RMS 
IF(RMSO.LT. (RMS/2.0)  ) 
DO  105  I » 1,IP 
AO  ( 1 ) = A ( I ) 

105  CONTINUE 

IV FC  a IVF 
IPPO  * IPP 


RMS0=RMS/2 .0 


C 

c 

c 


TEST  IF  VOICED 
IF(  IVFO.NE.O) 


GO  TO  300 


C 

C RECCNSTRUCT  UNVOICED  SPEECH 
C 

200  E * RMSO*GGNOF  ( IS  EEC  ) 

DO  210  I » It  IP 
NSMI  * NS-I 
E = E-A(  I )*X ( NSM I ) 

210  CONTINUE 
X ( NS ) a E 

IF  ( NS  .GE.  NIP)  GO  TO  600 
NS  * NS+1 
GO  TO  200 
C 

C START  VOICED  PULSE 
C 

300 


8 


C 
400 


NP  « 1 

EX  *RMSO*  SQRT(FLOAT(IPPO)) 

TEST  FOR  BEG  INI NG  OP  PULSE  PERIOD 

IF  ( NP.GT. IPPO)  GO  TC  100 
E a 0.0 

IF  (NP.EQ.  1)  E » -EX 


C 

C RECONSTRUCT  VOICED  SPEECH 
C 

500  CO  510  I « It  IP 
NSMI  - NS-I 
E a E— A ( 1 }*X ( NSMI ) 

510  CONTINUE 
NP  a NP+1 
. X(NSI  • E 

IF( NS.GE.NIP)  GO  TO 


600 


90 


1 


- 


c 

c 

c 


NS  » NS*1 
GO  TO  400 

SAVE  VALUES  ANO  FREPARE  OUTPUT 


60  0 

DO  610  I 

XX(  11  » X 

610 

CONTINUE 

DO  620  I 

S(I)  * X( 

620 

CONTINUE 

- 1.  IP 
< N*I) 


RETURN 

END 


THIS  PAGE  IS  BEST  QUALITY  mCIICABUI 

raoMOOi-YnaaiskUSuroDDO 


C 

c 

c 

c 

c 

c 

c 


10 


SUBROUTINE  RMS  (X,N,VAL) 

DETERMINE  THE  RMS  VALUE  OF  A SET  OF  CATA 

X * VECTOR  OF  INPUT  SAMPLES 
N » NUMBER  OF  SAMPLES 
VAL  = RMS  VALUE  RETURNED 

DIMENSION  X ( 1 ) 

VAL  = 0.0 


DO  10  I = l.N 
VAL  * V AL ♦ X ( I 1**2 
CONT  INUE 

VAL  * SQRTtVAL/FLOAT (Nl  1 

RETURN 

ENC 
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C 

I 

C 

C 

C 

C 

c 

c 

c 

c 


SUBROUTINE  W INDW  l X , Y , N,  IW  IN  ) 


X * VECTOR  OF  UNWINDOWED  SAMPLES 
Y * VECTOR  OF  WINDOWED  SAMPLES  (OUTPUT) 
N * NUMBER  OF  SAMPLES 
IW IN  * TYPE  OF  WINDOW 


0 

1 

2 

3 

4 


RECTANGULAR  (COPY  ONLY) 

HAMMING  (ALPHA  - 0.54) 

BARTLETT 

BLACKMAN 

HANNING 


C 

C 

C 

10 


DIMENSION  X(1),Y(1) 

CAT  A PI,TW0PI,F0RPI  /3. 1 41 5926 , 6.  2321 853 , 1 2 . 5 663  71  / 
IF(  IWIN.LT.O.CR.IWIN.GT  .4)  GO  TO  9SS 
AN  * FLOAT(N) 

GO  TO  (110,210,310, 410) ,IWIN 


RECTANGULAR  WINDOW  COPY  VECTOR 


20 


CO  20  1*1 »N 
Y(  I)  * X(  I) 
CONTINUE 
RETURN 


C 

C HAMMING  WINDOW 
C 

110 


120 


00  120  I * 1»N 
A J * FLCAT  (1-1) 

Y{  I ) * X(  I )*(  C.  54-0.  +6*C0S(  TWOP I * A J/ ( AN-1  .0  ) ) ) 

CONT  INUE 

RETURN 


C 

C BARTLETT  WINOOW 
C 

210 


NN 


N/2 


NNN  * NN*1 

“ 120  I -I, 


220 


230 


00  220  I *1 » NN 
AJ  a FLOAT  ( I- 1) 

Y ( I ) * X ( I ) *2  . 0*  AJ  / ( AN— 1.0) 
CONTINUE 
00  230  I a NNN,  N 
AJ  » FLOATd-l) 

Y(  I ) * X(  I )*2.0*(  1. 0-AJ/(  AN-1.0)  ) 

CONTINUE 

RETURN 


C 

C BLACKMAN  WINOOW 
C 

310 


CO  320  1*1  tN 

' OAT(I-l) 


320 


AJ  * FLOA”  _ 

Y(I)  * X ( I )♦( 0.42-0 . 5*C  OS (TWOPI * AJ/ ( AN-1 . 0)  ) 
* ♦0.08*C0S(F0RPI*AJ/(AN— l.C)  )) 

CONTINUE 
RETURN 


C 

C HANNING  WINOOW 
C 

410 


420 


999 

998 


CO  420  I *1  ♦ N 
AJ  * FLOAT(I-l) 

Y ( I I « X(  I)*O.5*(1.0-C0S(TWOPI*AJ/(AN-1.0))) 

CONTINUE 

RETURN 

WRITE(6 ,998 > 

FORMATI//10X,***  ERRCR  SUBR  WINOOW  **•//> 

STOP 

END 
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SUBROUTINE  VPLTIN  (N) 


SUBROUTINE  CREATES  A VERSAPLOT  GRAPH  CF  60  FRAMES 
OF  VOICE  SAMPLES  (128  SAMPLES  / FRAME  l 

CALL  VPLTIN  TO  INITIALIZE  EACH  PLOT 

CALL  VPLT  FOR  EACH  FRAME 

N*NUMBER  OF  SAMPLES  PER  FRAME 
X- VECTOR  OF  SAMPLES 

CALLING  PROGRAM  SHOULD  ISSUE 
CALL  PLOT  (X.Y, 999) 

TO  COMPLETE  PLOTTING 

DIMENSION  X(  768 ) » Y( 256 ) , X0< 8 ) , YO < 8 ) 

CATA  XO/O  .0,0  .0,7. 0,0. 0,7.0, 0.0, 7. 0,0.0/ 

DATA  Y0/10.  ,-10.,  10.  ,-lC.  ,10.  ,-10 . , 10. ,-l 0 . / 

DO  10  1*1,768 

X(  n*FL0AT(I)/128.0 

CONTINUE 

CALL  PLOTS!  IA,  IB,  IC  ) 

NPEN*2 

CALL  NEWPEN(NPEN) 

NPLT*1 
IPEN  * -3 

CALL  PLOT  (XO(NPLT),YQ(NPLT),IPEN) 

I PE  A *2 
IX*768 
IY  * 11 
RETURN 


ENTRY  VPLT I Y ) 

DO  ICO  I*1,N 
IX*  IX  ♦! 

IF  ( IX.LE.768)  GO  TO  50 

IX-1 

IY  * IY-1 

YS*2.0+0.7*FLCAT(I  Y) 

IF  ( IY  .GE  .1  ) GO  TO  4C 
NPLT *NPLT  >1 
I PEN*-3 

CALL  PLOT  (XO(NPLT) , YO(NPLT  ), IPEN) 

I PEN*2 

IX*1 

IY*IO 

YS*2  .0+-0.  7*FLQAT(  IY  ) 

IP  EN  *3 

YY*Y(  I )/100  .0  *YS 

CALL  PLOT(X(IX) ,YY, IPEN) 

IPEN-2 
GO  TC  100 

yy*y<iT /Ioo.o+ys 

CALL  PLOT  (X(I  X),YY,  IPEN  ) 

CONTINUE 

RETURN 

ENC 


_c<t  qDATjTTT 


APPENOIX  A. 3 


POWER  SPECTRAL  OENSITY  ANALYSIS  ANO 
PLOTTING  PROGRAM 


9 

10 


20 

25 


30 


90 


DIMENSION  X (256 ) 

REAO( 5t8jEND*50)  I NUM,  I SKI P , I WI N 
FORMAT  (315) 

IF( ISKIP.EQ-.0 ) GO  TO  10 
DO  $ I*  1»  I SKI  P 
REA£(2f25»ENDa9a)  X 
CONTINUE 
M*8 

REAQ(2.25 ,END»90I  X 
CALL  P$DINT(X,M) 

CALL  SPLINT 
K*0 

REA0(2»25»END=90)  X 
FORMAT  (128A4) 

CALL  PSD(X,MtIWlN) 

IF  ( K .L  E.6  ) WRITE<6»30)  ( X<  J ) , J*l,  126  I 
FORMAT(1X,10F12.5) 

K*K«-1 

CALL  SPL  (XI 
IF(K.GT.INUM)  go  TO  90 
GO  TO  20 
IP  EN=999 

CALL  PLOT (AX  » Y » IP  EN  ) 

STOP 
ENO 


■SS5S55S""*0 


w.iwtwwm**1* 
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c 

s 

c 

c 

c 

a 

c 

c 

8 

c 

c 

c 


c 

c 

c 


c 

c 

c 


10 


15 


30 


C 

c 


c 

c 

c 

c 

c 

c 


50 


SUBROUTINE  SPLINT 

SUBROUTINE  PLOTS  THE  POWER  SPECTRAL  OENSITY 
t LOG  OF  MAGNITUDE)  FOR  128  FREQUENCIES  WHICH 
IS  INPUT  IN  MAGNITUDE  FORM  IN  VECTOR  Y 

VALUES  IN  Y SHOULD  BE  BETWEEN  0.01  AND  100.0 

CALL  SFLINT  TO  INITIALIZE  PLOTTING 

CALL  SFL  (Y)  FOR  EVERY  SET  OF  123  PSD  VALUES 

CALLING  PROGRAM  SHOULD  ISSUE  CALL  PLOT  (X,Y,999) 
WHEN  PLOTTING  IS  COMPLETE 


DIMENSION  Y Cl  )»X(128),YY(128  ) 

DIMENSION  R0RGX(6)  ,R0RGY(6)  ,GX(19>  ,GY  ( 1 9 » , I G P <1 9 ) 

DATA  FCR  SIX  PLOT  ORIGINS 

CAT  A RORGX/0. 1,-1. 2,-1. 2,  8. 8, -1.25-1.2/ 

OATA  RQR GY/0.5,4.0, 4.0, -17.0»4.0»4.Q/ 

DATA  TC  PLOT  AXIS 

DATA  GX/ 7. 5,  7. 5, 6.  C, 6. 0,4. 5. 4. 5. 3. C, 3. 0,1. 5. 

* 1*5,  0.0,  0.0,  —0  .1  ,0.0,— 0.1,0.  U » — 0.1, 0.0,  — 0.1/ 
DATA  GY/0.0,  — C.  1,0.  0 ,—  0.1  ,0.0  ,—  0 . 1 , 0 . 0 * — 0 . 1 , 

* 0.0, -0  .1,-0  .1, 0 .800, C.300,  0.600, 0.600, 0. 4, C.4, 

* C.200,  0.200/ 

OATA  I GP/2, 2, 2, 2, 3, 2, 3, 2, 3, 2, 3, 2, 2, 2, 2, 3, 2, 3,2/ 

00  10  1=1,128 

X(  I )=Fi.OAT(I-l)  *0.05859-0.04 
CONTINUE 

CALL  PLOTS  (I  A,  I B,  I C) 

IFLAG=0 
1PLTN=1 
I S C A N =0 
IPEN=-3 

CALL  PLOT  (ROR  GX  ( IP  LTN  ) , RORGY  ( IPLTN  ) ,IPEN) 

NPEN= 4 

CALL  NEWPEN(NPEN) 

DO  30  1=1,19 

CALL  PLOT  (GX  ( I ) ,GY<  I ) , IGP(I) ) 

CONTINUE 
NPE  N=2 

CALL  NEWPEN(NPEN) 

RETURN 


ENTRY  SPL  (Y) 

I SC  AN  = I SCAN  f 1 

RETURN  IMMEDIATELY  IF  PLOT  FULL 

IF<  IFLAG.EQ.il  RETURN 

CONVERT  DATA  TO  LOG  PLOT 

DO  50  1*1,128 
YTEM* y<  I ) 

IF(YTEM.LT.O.IOO)  YTEM=C.100 

YY ( II =0.1 0+0. 2000* AL0G10 (YTEM) 

CONTINUE 

IP EN*-3 

XSC AN0=0. 04 

YSCANO=0 • 1 

CALL  PLOT  ( XS  CANO,  YSCANO,  IPEN  ) 
IP5N=3 

CALL  PLOT  (X(l  ),YY(1  ),  IPEN  ) 

I PEN*2 


95 


A 


60 


?3  60  1-2*128 
ALL  PLOT  (X ( 1 ) * YY ( I ) * I PEN ) 
CONTINUE 

IF  (ISCAN.LE.29)  RETURN 

ISCAN-0 

IPLTN-IPLTN+1 

IFUPLTN.LE.6  ) GO  TO  15 

IFLAG-1 

RETURN 

ENC 


TRIM  PA  01  IS  BfeST  QUALITY  FRACTICABU 
from  oohi  pujuish®  rouuc 


SUBROUTINE  PSCINT(X.M) 


POWER  SPECTRAL  DENSITY  BASED  ON  ALGORITHM 
PRESENTEO  BY  C M RACER  IN.  • AN  IMPRCVEO 
ALGORITHM  FOR  HIGH  SPEED  AUTOCORRELATION 
WIT  h APPLICATION  TO  SPECTRAL  ESTIMATION,' 

IEEE  TRANS  AUOlO.E LECTR CACOUSTI CS  . V AU-18.DEC70 


X ■ VECTOR  OF  INPUT  SAMPLES 
M « POWER  OF  2 FOR  NUMBER  OF  SAMPLES 
IW  IN  - 0 NO  WINDOW 

1 HAMPING  (ALPHA  - 0.54) 

2 BARTLETT 

3 BLACKMAN 

4 HANNING 

FIRST  CALL  IS  TO  PSOINT  AND  THEN  EACH  SUCESSIVE 
CALL  FOR  THAT  STRING  OF  DATA  SHOULC  BE  TO  PSD 
TO  START  A FRESH  STRING  OF  DATA  CALL  PSDINT  AGAIN 


DIMENSION  X<  256),  I WK<  11) 
Complex  xn(512I }xnp(512 j.yn 
DATA  XN.XNP/1C24M  C.O.C.O)  / 
MM  ■ M + l 


<512 ),  A I < 512  ) 


MM  ■ M«-l 
N-2**M 
NN«  2*N 

SPECIFY  COEFFICIENTS  NEEDED  IN  ADDITION 
OF  NEXT  X ( F ) VECTOR  TO  CURRENT  X(F)  VECTOR 
TO  WAKE  Y(F)  VECTOR.  IN  BINARY  REVERSE  OROER. 


NNN  « NN-1 
DQ  <J0  I-l.NNN.2 
AI(I)  - <1.0. 0.0) 
ii  - m 

AK  II  I * (-1 .0,0.0) 

CONTINUE 

CALL  FFRDR2  < 41, MM,  IWK) 

AIMG  » 0.0 

DQ  101  I ■ 1,N 

XN<  1)  • CMPLX(X(II,  AIMG) 

CONTINUE 

FFT  OF  CURRENT  X < T I VECTOR, LAST  HALF  ZERO  • 

CALL  FFT2  (XN.MM.IWK) 

RETLRN 

USE  THIS  ENTRY  FOR  EACH  FRAME  AFTER  FIRST 

ENTRY  PSD  <X,M,IWIN) 

HH  m M«-l 
N ■ 2**M 
NN  « 2*N 
AN  « FLOAT  IN ) 

ANN  ■ FLOAT  ( NN ) 

AIMG  - 0.0 

00  110  I « 1,N 

XNP  <1  ) - CMP LX( X<  I ) , AIMG) 

CONTINUE 

FFT  OF  NEXT  X<T>  VECTOR, LAST  HALF  ZERO. 

CALL  FFT2  < XNP, MM, IWK) 

FORM  Y ( F)  VECTCR , CO  EFF  IN  REV  BINARY  OROER. 
CO  120  I - l.NN 

vn < i T » <xN(h«-Ai<n*xNPm  i*coNjG<xNP<in 
CONTINUE 


CO  123  I 


1 , NN 


FOR*  CONJG  TO  PREFORM  INV  OFT 
YN(  I)  - CONJG  (YN  (III 

CONTINUE  mcnCAM» 

INV  FFT  OF  Y(F)  GIVES  RXX(TAU)  THIS  PAGE  IS  BEST 

OOPY  WWHSHB)  TO  DDC 

CALL  FFT2RV  ( YN»MM,  I bK) 

00  143  I - l.NN 

Y N ( 1 1 - CCNJG  (YN  ( I ) ) / ANN 

CONTINUE 

CALL  WIND2  (YN.N.  I W IN  I 
CALL  FFT2  (YN,M,IWKJ 
CALL  FFRDR2  ( YN»M  , I 1»K) 

CO  153  I - 1,N 

XdT  - CABS  ( YNl  I ) )/(AN**2) 

CONTINUE 

MOVE  NEXT  X(P)  INTO  CURRECT  X(F) 


00  160  I 
XN  ( I ) - 
XMP ( I ) « 
CONTINUE 
RETURN 
ENC 


- l.NN 
X.MPl  I ) 

(0  .0,0.0  ) 


SUBROUTINE  WIN02  (B,N,IWIN) 

Satalpi  Atm  /3. 1415926, 6. 283185/ 
AN  » FLOAT ( N ) 

GO  TO  (200  ,30C, 400,100) , I 
RETURN 

00  190  I * 1,N 
AJ  » FLOAT(I-l) 

F ■ 0 .5* ( 1 .O-COS  ((TWOPI*AJ 
B(  I ) - B ( I ) *F 
CONTINUE 
RETURN 

00  290  I ■ l,N 
AJ  « FLOAT  ( I — 1) 

F « 0.54-0.46*C0S  ( (TVI0PI*A 
8(1)  « B(  I ) *F 
CONT  INUE 
RETURN 

00  390  I » 1, N 
AJ  « FLOAT  U-l) 

IF(I.LE.(N/2)  T F « 2.0*AJ/( 

IF(  I .GT  • ( N/2  ) ) F - 2.  0-2.0* 

8(1)  ■ B(  I ) *F 
CONTINUE 
RETURN 

00  490  I « 1 ,N 
AJ  » FLOAT(I-l) 

F ■ 0.42-0. 5*C0S  ( T WOP  I * AJ/ 


( ( TWOPHAJ  )/  (AN-1.0)  ) ) 


( (TWOPI*AJ )/ ( AN— 1 .0 ) ) 


• 0*AJ/(  AN-1.0) 

. 0-2.0*  AJ  / ( AN-1 .0) 


• FLOAT(l-l) 

F « 0.42-0. 5*C0S  ( T WOP  I * AJ/  ( AN-1  • G ) ) +0 .08  * 
* C0S(4.O*PI*AJ/(  AN-1.0)  ) 

3(  I ) - B(  I)*F 

CONTINUE 

RETURN 

END 


APPENDIX  4.4 


NJNE  TRACK  TO  SEVEN  TRACK  TAPE  CONVERSION 
PROGRAM 


15 


18 


19 

80 

C 

20 


25 

100 

c 


i6 


DIMENSION  DAT! 1024 ) ,10 AT! 1024) 
FACT0R-(2 .0**23)/250.0 
HTEST»2**23-1 


LTEST— HT6ST 
NFUES-6 
REWIND  2 
REWIND  4 
N-1C24 

CO  200 
WRITE  (6 
FORMAT! 


.Til 

' IF  I 


NF1LES 

I 

ILE* ,14 J 


y»0» 


00  100  J«l,50 
READ!  2,15,  END*  19C, ERR 
FC.RMAT  (128A4) 

GO  TO  50 
WR  IT  E t 6,  21  ) 

FORMAT  ! 60X  • ' READ  ERROR' 
WRITE! 6,16)  J 
FORMAT  (10X  , 'RECORD  HAS 
IF(J.EQ.l)  WRITE(6,17) 
FORMAT ( IX, 10F12 • 3 ) 


30)  DAT 


BEEN 

DAT 


RE  AD',  I 4) 


DO  80  K *1 , 1 024 
IDAT(K)»IFIxTDAT(K)*FAC  TOR) 
IF!  IDAT!K).GT.HTEST)  WRITE<6 
FORMAT! 40X, • TCC  LARGE  FILE' 

, 


I 


ITEM*,  14) 


o,18)  I » J , K 
,14,'  RECORO',14, 


F!  IOAT!K).LT. LTEST)  WRITE16.19)  I.J.K 
ORMAT!  40X, 'TOO  SMALL  FlLE',l4,'  RECORO1 
• *»'■“■  14  ) 


55 


190 

27 


00 


•ITEM*, 
CONTINUE 


IFU.EC.l)  WRITE16,20) 
»=0RMAT!1.X, 10112) 

CALL  MORF!  IDAT , N I 
WRITE! 4,25  ) IDAT 
cORMAT  (128!  8A4)  ) 

CONT  INUE 

WR  I TE(  6 » 26  ) 

FORMAT  ( 5X  , • ALL  50  RECORCS 
READ!  2, 15  »EN0*1  90)  OAT 
GO  TO  155 
WRI  TE(  6 ,2  7) 

FORMAT!  2X,  ' END  OF  FILE') 

ENCFILE  4 

CONTINUE 

STOP 

END 


14 


IDAT 


R EAC'  ) 


kctic^ 


99 


APPENDIX  B.l  COMPUTER  ANALYSIS  AND  MODIFICATION  OF  VOICED 
SPEECH 

The  15  frame  C 334  msec.  ) segment  of  speech  analyzed  In 
this  appendix  Is  the  "long  e"  sound  (as  in  need)  and  is 
spoken  by  a woman.  The  process  Illustrated  shows  both 
direct  reconstruction  and  reconstruction  with  the  pitch 
reduced  by  a factor  of  0.58  and  the  formant  frequencies 
reduced  by  a factor  of  0.88. 


— 


Figure  B.1.1  WAVEFORM  OF  INPUT  SPEECH 
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Figure  B.1.2  LOGARITHMIC  POWER  SPECTRAL  DENSITY  OF  INPUT  SPEECH 
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Figure  11.1.3(a)  Processing  Summary  of  Fra 
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Figure  b. 1.3(d)  Processing  Summary  of  Frame 
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Figure  B.  1.3(e)  Processing  Summary  of'  Frame 
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Figure  B. 1.6(a)  VOCAL  TRAQT  POLE  LOCATIONS 
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APPENDIX  B.2  COMPUTER  ANALYSIS  AND  MODIFICATION  OF 
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The  15  frame  ( 38U  msec.  ) segment  of  speech  analyzed  In 
this  appendix  Is  the  "sa'*  sound  (begining  of  salt)  and  Is 
spoken  by  a woman.  The  process  Illustrated  shows  both 
direct  recons truct i on  and  reconstruction  with  the  pitch 
rsduced  by  a factor  of  0.58  and  the  formant  frequencies 
reduced  by  a factor  of  0.88. 
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Figure  B. 2. 3(a)  Processing  Summary  of  Frame 
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Figure  B. 2. 3(b)  Processing  Summary  of  Frame 
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Figure  B. 2. 3(c)  Processing  Summary  of  Frame 


Figure  2.3(d)  Processing  Summary  of  Frame 
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Figure  B. 2. 3(e)  Processing  Summary  of  Frame 


Figure  8.2.6(a)  VOCAL  TRACT  POLE  LOCATIONS 


Figure  B. 2. 6(b)  VOCAL  TRACT  POLE  LOCATIONS 


Figure  B.2.9  LOGARITHMIC  POWER  SPECTRAL  DENSITY  OF  UNMODIFIED  OUTPUT  SPEECH 


OUTPUT  SPEECH 


~~ 


APPENDIX  C DESCRIPTION  OF  VOICE  TAPE 


The  audio  recording  which  Is  available  from  the  author  has 
four  sections  each  of  whtch  contains  three  segments  of 
speech.  These  three  speech  segments  are  of  the  following 
sounds : 

Segment  1 - Five  long  vowels. 

•*  - 

"a  e I o u" 

Segment  2 - Four  words  which  are  combinations  of 

fricatives  and  voiced  sounds. 

"sat  free  hip  done" 

Segment  3 - A sentence  with  a varlty  of  sounds. 

"Every  salt  breeze  comes  from  the  sea." 

Each  of  these  segments  is  repeated  in  each  segment  of  the 
tape.  Each  section  of  the  tape  shows  the  effects  of  a 
different  step  In  the  processing. 


Section  1 - Unprocessed  speech,  the  recording  used 
for  input  to  the  processing  system. 

Section  2 - Speech  which  has  been  converted  to 
digital  form  and  then  converted  back  to  analog 
form  with  no  other  processing. 

Section  3 - Speech  whtch  has  been  encoded  Into  a 
set  of  LPC  parameters  and  then  decoded  using  the 
same  parameters  (i.e.  no  modification). 

Section  k - Speech  which  has  been  encoded  into  a 
set  of  LPC  parameters  and  those  parameters  altered 
to  reduce  the  pitch  frequency  by  a factor  of  0.56 
and  to  reduce  the  formant  frequencies  by  a factor 
of  0.88.  The  same  LPC  decoding  process  is  then 
used  to  reconstruct  the  speech  segment. 
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